/ 2023-11-13-write-ups-for-flare-on-10
back to the top

Write-ups for Flare-On 10

2023-11-13 — last updated 2024-03-01

With the official write-ups and countless community blog posts out there, much has already been said. Still, this blog post serves as a way for me to document my own process and share some notes. For brevity, a bunch of hitting-my-head-against-the-wall was omitted.

As I did not take detailed notes while Flare-on 10 was running, I will try to reconstruct my paths through the challenges as time and memory permits. This page contains write-ups for the following challenges:

1: X

TLDR: click the button 42 times.

Welcome to the 10th Annual Flare-On Challenge!

Statistically, you probably won’t finish every challenge. Every journey toward excellence starts somewhere though, and yours starts here. Maybe it ends here too.

This package contains many files and, I can’t believe i’m saying this, click the one with the “.exe” file extension to launch the program. Maybe focus your “reverse engineering” efforts on that one too.

The first challenge is the classic liveness check. Starting the executable shows a big cross, and a bunch of buttons that we can click to interact with it; presumably we need to find the right 2-digit code and press the unlock button.

X.exe appears to be a plain MSVC/C++ executable, but all bundled DLLs suggest there’s some .NET going on here. In addition to X.exe, there’s also an X.dll that seems interesting. The naming suggests it’s not library code, and Detect It Easy identifies it as .NET.

After loading the .dll in DNSpy and clicking around a bit, I quickly succumbed to frustration of not finding the relevant code (it seems I overlooked monogame.Game1 entirely) and just decide to click the buttons a bunch of times. I should’ve known: the number 42 leads to the flag.

2: ItsOnFire

TLDR: find string AES-CBC and IV. Build the AES key using string slices. Decrypt iv.png.

The FLARE team is now enthusiastic about Google products and services but we suspect there is more to this Android game than meets the eye.

This challenge consists of a single APK file called ItsOnFire.apk. Straight into Android on the second challenge; might as well get it over with.

Without a ready-to-go setup for Android analysis, I scramble a bit and install Android Studio to have a look at what we’re dealing with. I play a bit of Space Invaders.

For static analysis, I open the APK using jadx-gui. It appears the APK is obfuscated; all names were replaced with meaningless placeholders. I start at the top and stop when I find something of interest; in f.b, I identify code that performs CRC32, and there’s symmetric crypto going on. Two of the functions retrieve various values from the APK’s resources

private final File c(int i2, Context context) {  
	Resources resources = context.getResources();  
	Intrinsics.checkNotNullExpressionValue(resources, "context.resources");  
	byte[] e2 = e(resources, i2);  
	String d2 = d(context);  
	Charset charset = Charsets.UTF_8;  
	byte[] bytes = d2.getBytes(charset);  
	Intrinsics.checkNotNullExpressionValue(bytes, "this as java.lang.String).getBytes(charset)");  
	SecretKeySpec secretKeySpec = new SecretKeySpec(bytes, context.getString(R.string.ag));  
	String string = context.getString(R.string.alg);  
	Intrinsics.checkNotNullExpressionValue(string, "context.getString(R.string.alg)");  
	String string2 = context.getString(R.string.iv);  
	Intrinsics.checkNotNullExpressionValue(string2, "context.getString(\n     …             R.string.iv)");  
	byte[] bytes2 = string2.getBytes(charset);  
	Intrinsics.checkNotNullExpressionValue(bytes2, "this as java.lang.String).getBytes(charset)");  
	byte[] b2 = b(string, e2, secretKeySpec, new IvParameterSpec(bytes2));  
	File file = new File(context.getCacheDir(), context.getString(R.string.playerdata));  
	FilesKt__FileReadWriteKt.writeBytes(file, b2);  
	return file;  
}  

This appears to build some sort of crypto object using R.string.ag and R.string.alg to identify the algorithm, and R.string.iv as an IV. I find out that Android apps store their strings in res/values/strings.xml, and identify the following strings:

<string name="alg">AES/CBC/PKCS5Padding</string>
<string name="ag">AES</string>
<string name="iv">abcdefghijklmnop</string>

So, AES it is, then! There’s also a bunch of strings regarding days of the week and a couple URLs, and I resist the urge to browse to them (.. briefly – there’s a Youtube URL that’s not clearly Rick Astley, and I find out it’s Working for the Weekend by Loverboy). More carefully reading through the function, it appears that the function d is responsible for generating the AES key.

private final String d(Context context) {  
	String slice;  
	String string = context.getString(R.string.c2);  
	Intrinsics.checkNotNullExpressionValue(string, "context.getString(R.string.c2)");  
	String string2 = context.getString(R.string.w1);  
	Intrinsics.checkNotNullExpressionValue(string2, "context.getString(R.string.w1)");  
	StringBuilder sb = new StringBuilder();  
	sb.append(string.subSequence(4, 10));  
	sb.append(string2.subSequence(2, 5));  
	String sb2 = sb.toString();  
	Intrinsics.checkNotNullExpressionValue(sb2, "StringBuilder().apply(builderAction).toString()");  
	byte[] bytes = sb2.getBytes(Charsets.UTF_8);  
	Intrinsics.checkNotNullExpressionValue(bytes, "this as java.lang.String).getBytes(charset)");  
	long a2 = a(bytes);  
	StringBuilder sb3 = new StringBuilder();  
	sb3.append(a2);  
	sb3.append(a2);  
	String sb4 = sb3.toString();  
	Intrinsics.checkNotNullExpressionValue(sb4, "StringBuilder().apply(builderAction).toString()");  
	slice = StringsKt___StringsKt.slice(sb4, new IntRange(0, 15));  
	return slice;  
}

The function loads two strings and performs a bit of slicing to extract 16 characters. Let’s grab the relevant resources.

<string name="c2">https://flare-on.com/evilc2server/report_token/report_token.php?token=</string>
<string name="w1">wednesday</string>

Ah, so this is where the URL comes into play! I replicate the functionality in Python, armed with the knowledge that function a is just a wrapper around CRC32.

from binascii import crc32

string = "https://flare-on.com/evilc2server/report_token/report_token.php?token="
string2 = "wednesday"
sb = string[4:10] + string2[2:5]

a2 = crc32(sb.encode('utf-8'))
s = str(a2) + str(a2)
print(s[:16])

This outputs 4508305374508305 (and it has me double-checking whether that’s a 16-byte character string, or an integer).

Okay, now we have AES with a key (4508305374508305) and IV (abcdefghijklmnop), but no ciphertext to decrypt yet. Clicking through the resources, there’s an odd iv.png and ps.png in res/raw; these are not valid PNG files. Could it be..?

Exporting a resource is done by clicking Save as gradle project in the JADX interface. Turning to CyberChef, we can decrypt the images using AES in CBC/NoPadding mode.

ps.png

iv.png

3: Mypassion

TLDR: decompile, step along in a debugger. Gradually work through the stages, tweak input until all checks are satisfied.

This is one of those ones that will work under the right circumstances, Steve. May I call you Steve? Before you take to twitter complaining about a broken challenge, think about how you can be the change you want to see in the world, Steve.

4: Aimbot

I hope this is the only aimbot on your system. Twitch streaming probably pays better than being a mediocre reverse engineer though.

TODO

5: Where_am_i

I wish we had more easy challenges for you this year for you to rack up your personal high score. It wouldn’t help you improve but it would feel great and that is what is most important.

TODO

6: FlareSays

TLDR: run the game in DOSBox-X. Fiddle a bit with the built-in debugger. Give up. Analyze the 16-bit assembly statically. Decrypt strings. Find keyboard input check. Note that file is self-modifying after 128 levels. Patch binary to always set correct input, run until level 128. Run modified file; no success. Note Konami code input check on start screen and that it affects RNG setup. Press Konami code, see blinking screen. Run again. Run modified file.

You’re doing great champ! This challenge is a modern (and retro) take on the classic game Simon Says. The cool thing about this one though is it will make you give up, and maybe take that sales job after all. It’s better for everybody that way.

7: Flake

TLDR: play snake for a bit. Run strings, find d3m0_c0nf.txt file name and XOR key. Decode config file. Set starting score to 10000, run into errors. Try to set up Nuitka to understand how to do injection. Fail miserably. Use Cheat Engine](https://www.cheatengine.org) to set Oliver’s score to 1. Overtake Oliver.

Subject: need your help... Oliver Shrub, Gardens Department Head, keeps bragging about his high score for this rip off Snake game called Flake. I'm pretty sure he found a way to cheat the game because there is no way it's possible to score 10,000 points...I mean the game ships with a sketchy TXT file so there must be something fishy going on here. Can you look at the game and let me know how I can beat Oliver's high score? Best, Nox Shadows Department Head

8: AmongRust

Our customer recently found the following malware executing on one of their machines. The system was left with a very silly looking wallpaper, and a lot of executables on the machine had stopped working. The customer had successfully restored from backup, but we are still interested in understanding all capabilities of the malware. A packet capture file has been provided to aid your analysis.

TODO

9: Mbransom

TLDR: use bochs debugger to start disk image. Identify deobfuscation, set breakpoints. Figure out keyboard input loop. Find key validation. Compute first 12 characters (XOR constant with 0x55). Use Unicorn to emulate the code and brute force the remaining 2 bytes.

You’re doing so great! Go out and celebrate. Take a day off kid, you’ve earned it. Watch your scoreboard position fall like the sand through the hourglass. Avoid this VM and feel the joy the outside world has to offer. Or crush this one and earn even more internet points, those will come in handy.

10: Kupo

TLDR: boot pdp11 using SIMH. Use rawtap to extract binary from tape. Learn some Forth. Try to debug using built-in adb debugger. Parse the PDP11 symbol table to label functions. Distinguish between native functions and Forth code. Statically figure out that decrypt is a xor-loop (use p/q2-q4! as key) and decode is some accumulator loop. Re-implement in Python.

Did you know the PDP-11 and the FOURTH programming language were both released 53 years ago, in 1970? We like to keep our challenges fresh here at Flare-On, in-line with the latest malware trends.

11: Over The Rainbow

TLDR: identify XOR and ChaCha20. Decrypt RSA public key. Attack given ciphertext using Coppersmith and stereotyped message.

I’m told this one is easy if you are really good. Based on your solve times so far Google Bard predicts your performance will be: “1 out of 5 stars. I’d give it 0 stars if I could. Food arrived over an hour late, covered in oil. I wouldn’t feed it to my dog”

12: HVM

TLDR: write code to decrypt RC4-encrypted functions. Note two input parameters. XOR hardcoded strings to get FLARE2023FLARE2023FLARE2023FLARE2023 as arg1. Figure out Salsa20 keystream; re-implement using Unicorn. Invert the XOR-loop function through tedious manual labor. Input the result as arg2.

This is the second smallest challenge this year! If only that mattered.

When we simply run the binary in a virtual machine, we’re presented with an error:

[-] OS/CPU feature not enabled

Opening the binary in IDA, we quickly find out where that error comes from. The first thing the main function does is call WHvGetCapability(WHvCapabilityCodeHypervisorPresent, ...), which I learn to be a way to check whether Hyper-V is available. It.. is not. After a brief foray into VMWare configuration, I set out to approach this challenge statically.

Some clean-up later, the following snippets suggest we’re supposed to supply two arguments with lengths between 9 to 47 and 25 to 64 …

// ...
len_arg1 = strlen(argv[1]);
len_arg2 = strlen(argv[2]);
if ( len_arg1 > 8 && len_arg1 < 48 )
{
  if ( len_arg2 > 24 && len_arg2 < 65 )
// ...

… the latter of which will be used to decrypt the flag!

// ...
if ( should_be_x1337 == 0x1337 )
{
  qmemcpy(xored_flag, &xored_flag_res, 42ui64);
  for ( i = 0; i < 41; ++i )
  printf("%c", argv[2][i] ^ (unsigned int)xored_flag[i]);
  printf("@flare-on.com\n");
}

The majority of the main function seems to be concerned with setting up and interacting with the Hyper-V environment. We can identify a function that initializes registers (sub_140001070), a function that reads RIP, R8 and R9 (sub_140001310) and uses them to derive arguments for RC4 (sub_140001730), and a function that reads RAX (sub_1400013E0). From the above snippet, we’ve figured out that RAX will need to be 0x1337 for us to get the flag.

All of this happens in a loop. The virtual processor is started (WHvRunVirtualProcessor), and runs until it hits an interrupt. At that point the exit reason is checked. Depending on the specific exit code, RC4 is called to decrypt the data pointed to by RIP using the eight-byte key in R8 and the data length in R9.

Setting the appropriate type for the ExitContext variable (WHV_RUN_VP_EXIT_CONTEXT) provides some more insight into what these exit reasons relate to.

// ...
loop_flag = 1;
should_be_x1337 = 0;
while ( loop_flag )
{
  if ( WHvRunVirtualProcessor(Partition, 0, &ExitContext, 0xE0u) >= 0 )
  {
  ExitReason = ExitContext.ExitReason;
  if ( ExitContext.ExitReason == WHvRunVpExitReasonX64IoPortAccess )
  {
    get_RIP_R8_R9(Partition, RIP_R8_R9);
    if ( (*(_BYTE *)(&ExitContext.ApicEoi + 5) & 1) != 0 )
      RC4(buf, RIP_R8_R9[0] - 16 - RIP_R8_R9[2], RIP_R8_R9[2], (_BYTE *)RIP_R8_R9[1]);
    else
      RC4(buf, RIP_R8_R9[0] + 2, RIP_R8_R9[2], (_BYTE *)RIP_R8_R9[1]);
    increment_RIP_with_2(Partition);
  }
  else if ( ExitReason == WHvRunVpExitReasonX64Halt )
  {
    should_be_x1337 = get_RAX(Partition);
    loop_flag = 0;
  }
  else
  {
    loop_flag = 0;
  }
  }
}
// ...

But where does the code live? And what’s in buf? It appears that’s what sub_140001440 takes care of.

__int64 __fastcall sub_140001440(_QWORD *a1)
{
  DWORD Size; // [rsp+20h] [rbp-38h]
  HMODULE hModule; // [rsp+28h] [rbp-30h]
  HRSRC hResInfo; // [rsp+30h] [rbp-28h]
  HGLOBAL hResData; // [rsp+38h] [rbp-20h]
  const __m128i *Src; // [rsp+40h] [rbp-18h]

  hModule = GetModuleHandleA(0i64);
  hResInfo = FindResourceA(hModule, (LPCSTR)0x85, (LPCSTR)0x100);
  hResData = LoadResource(hModule, hResInfo);
  Size = SizeofResource(hModule, hResInfo);
  Src = (const __m128i *)LockResource(hResData);
  return memcpy(*a1, Src, Size);
}

The above function calls FindResourceA(hModule, 0x85, 0x100) to get the resource with ID 0x85 (133) and resource type 0x100 (256). Opening the binary in CFF Explorer shows that indeed, there is exactly one resource, and its ID is 133. We can export the data to a file for closer inspection.

BC 00 80 FA 0F 01 16 26 0D 0F 20 C0 66 83 C8 01
0F 22 C0 EA 18 00 08 00 66 B8 10 00 8E D8 8E E0
8E E8 8E D0 E8 0E 00 00 00 0F 01 15 44 0D 00 00
EA F2 0C 00 00 08 00 BF 00 30 00 00 0F 22 DF 31
C0 B9 00 10 00 00 F3 AB 0F 20 D8 C7 00 03 40 00
...

There’s one more interesting snippet to consider before we dive into the resource code: we observe that the arguments that are passed to the original executable are inserted into the code before it is started, at 0x400 and 0x200 from the end of the allocated buffer. As the buffer is 0x10000 bytes, this puts the arguments at 0xFC00 and 0xFE00, respectively.

// ...
arg1 = &buf[buf_size - 0x400];
memcpy(arg1, argv[1], len_arg1);
arg2 = &buf[buf_size - 0x200];
memcpy(arg2, argv[2], len_arg2);
// ...
Decrypting the code

We open the resource file in IDA and scroll through the disassembly. Note that it is useful to open a new instance here, rather than load it into the existing database: we will want to set the compiler options to a compiler that uses the SystemV calling convention (such as GNU C++).

While most bytes do not represent actual x64 instructions, we notice several snippets that stand out:

...
000000000000082E                 mov     r8, 0D3A5541BC79F6DF3h
0000000000000838                 mov     r9d, 23Bh
000000000000083E                 out     3, al
0000000000000840                 retn
0000000000000840 ; ---------------------------------------------------------------------------
0000000000000841                 db  49h ; I
0000000000000842                 db 0B8h
0000000000000843                 db 73h, 0EAh, 87h, 80h, 0AAh
0000000000000848                 db 0EFh
0000000000000849                 db 29h, 53h, 41h, 0B9h, 2Dh, 2 dup(0)
0000000000000850                 db    0
0000000000000851                 db 0E4h
0000000000000852                 db    3
0000000000000853                 db 0A0h
0000000000000854                 db 0BCh
0000000000000855                 db  53h ; S
0000000000000856                 db 0A6h
0000000000000857                 db 0AFh
0000000000000858                 db 0F5h
0000000000000859                 db 25h, 0D8h, 1Fh, 6Fh, 32h, 0B5h, 30h
0000000000000860                 db  2Eh ; .
0000000000000861                 db 0AEh
0000000000000862                 dw 34C0h, 0AEE6h, 3D1Dh
0000000000000868                 db  8Bh
0000000000000869                 db  72h ; r
000000000000086A                 db  4Ah ; J
000000000000086B                 db 9Bh, 84h, 80h, 62h, 2
0000000000000870                 db  9Ah
0000000000000871                 db  1Dh
0000000000000872                 dw 5903h, 0E382h, 0F3D2h
0000000000000878                 db 0D6h
0000000000000879                 db  85h
000000000000087A                 db 0ADh
000000000000087B                 db 0C4h, 27h, 0CFh, 41h, 65h
0000000000000880 ; ---------------------------------------------------------------------------
0000000000000880                 mov     r8, 5329EFAA8087EA73h
000000000000088A                 mov     r9d, 2Dh ; '-'
0000000000000890                 out     3, al
0000000000000892                 retn
...

Note that earlier we concluded that R8 and R9 would contain the RC4 key and data length. Perhaps these out instructions are the interrupts we’re looking for. The length fields do not make sense, though; they appear to relate to the size of the chunks before the out call rather than after.

If we look at the above example, we can subtract 0x2D from 0x880 to arrive at 0x853. That still leaves us with several bytes preceding the supposedly encrypted data..

Indeed, converting the bytes at 0x841 to code results in the following snippet:

0000000000000841                 mov     r8, 5329EFAA8087EA73h
000000000000084B                 mov     r9d, 2Dh ; '-'
0000000000000851                 in      al, 3

Same length, same key, but an in instruction. It would appear this marks the start of a function, and out marks its end – perhaps to re-encrypt the function after it is executed. Using the Python code below, we decrypt all fragments.

import re


def RC4(inp, offset, length, key):
    S = list(range(0, 256))
    i = 0
    k = 0
    # KSA
    for j in range(256):
        i = (key[k] + i + S[j]) % 256
        S[j], S[i] = S[i], S[j]
        k = (k + 1) % 8
    # PRGA
    s = 0
    j = 0
    out = []
    for i in range(length):
        s = (s + 1) % 256
        j = (j + S[s]) % 256
        S[s], S[j] = S[j], S[s]
        out.append(inp[offset + i] ^ S[(S[s] + S[j]) % 256])
    return bytes(out)


def decrypt(ea, length, key):
    data[ea : ea + length] = RC4(data[ea : ea + length], 0, length, key)
    print(f"Patched {length} bytes at {hex(ea)}")


for match in re.finditer(b"\x49\xB8.{8}\x41\xB9.{4}\xE4\x03", data):
    ea = match.start()
    key = data[ea + 2 : ea + 10]
    length = int.from_bytes(data[ea + 12 : ea + 16], byteorder="little")
    out = ea + 0x12
    decrypt(out, length, key)
    # convert interrupts to NOPs to clean up decompilation
    data[ea : ea + 0x12] = bytes([0x90] * 0x12)
    data[ea + length : ea + 0x12 + length] = bytes([0x90] * 0x12)

This immediately allows IDA to identify 11 distinct functions, and a bit of code at the end that appears to be the entrypoint: this is where the arguments in 0xFE00 and 0xFC00 come into play!

0000000000000BC4                 push    rbp
0000000000000BC5                 mov     rbp, rsp
0000000000000BC8                 sub     rsp, 90h
0000000000000BCF                 mov     esi, 0FE00h
0000000000000BD4                 mov     edi, 0FC00h
0000000000000BD9                 call    sub_B3F
0000000000000BDE                 leave

Diving into sub_B3F, we see it uses sub_918 to check the first argument, and sub_A62 to check the second argument. On success, it returns the 0x1337 value the wrapper code is looking for.

  HIDWORD(v3) = sub_918(a1);
  LODWORD(v3) = sub_A62((int *)a1, (__int64)a2);
  if ( v3 == 0x2400000001 )
    return 0x1337;
  else
    return 0;
Argument 1

Inside sub_918 we find a simple XOR check:

  strcpy(key, "*#37([@AF+ .  _YB@3!-=7W][C59,>*@U_Zpsumloremips");
  strcpy(lipsum, "loremipsumloremipsumloremipsumloremipsumloremips");
  a1_len = strlen(a1);
  v6 = 0;
  for ( i = 0; i < a1_len; ++i )
  {
    if ( ((unsigned __int8)lipsum[i] ^ (unsigned __int8)a1[i]) == key[i] )
      ++v6;
  }
  return v6;

Computing the result of key ^ lipsum, we find the string FLARE2023FLARE2023FLARE2023FLARE2023. With a length of 36 characters, this fits the > 8 and < 48 constraints.

Argument 2

Now to figure out the second argument. Looking into sub_A62, we can break it down further.

__int32 __fastcall sub_A62(char *a1, char *a2)
{
  int v2; // ecx
  char v4[49]; // [rsp+10h] [rbp-40h] BYREF
  int v5; // [rsp+4Ch] [rbp-4h]

  memset(v4, 0, sizeof(v4));
  v2 = strlen(a2);
  v5 = sub_5E1(a2, v2, v4);
  if ( (v5 & 7) != 0 )
    return 0;
  sub_4AF(v4, v5, *(_DWORD *)a1);
  return sub_893(a1, v4, 48);
}

We can readily identify sub_5E1 as base64_decode; it starts off with a tell-tale v24 = 3 * (a2 / 4) output length computation (i.e., each input byte leads to 6 output bits) and proceeds to strip off the padding characters (=).

The final sub_893 is also trivial: it compares 48 bytes in v4 to a1. Recall that a1 is the string FLARE2023.... In the main function of hvm.exe, we can verify that the bytes at 0xFC00 following the user input are initialized to zero.

The function sub_4AF is where the magic happens, though. After a little cleaning, it looks as follows:

void __fastcall sub_4AF(char *arg2, int arg2_len, int arg1)
{
  int v3[16]; // [rsp+10h] [rbp-A0h] BYREF
  int v4[16]; // [rsp+50h] [rbp-60h] BYREF
  __int64 *state; // [rsp+98h] [rbp-18h]
  int v6; // [rsp+A4h] [rbp-Ch]
  int j; // [rsp+A8h] [rbp-8h]
  int i; // [rsp+ACh] [rbp-4h]

  memset(v4, 0, sizeof(v4));
  for ( i = 0; i <= 15; ++i )
    v3[i] = arg1;
  sub_A7(v4, v3);
  v6 = arg2_len / 8;
  state = (__int64 *)arg2;
  for ( j = 0; j < v6; j += 2 )
    sub_421(&state[j], &state[j + 1], (__int64)v4);
}

The local variable v3 is initialized based on the first user-supplied argument, and then passed into sub_A7, which we can identify to be the Salsa20 permutation. Variable v4 contains the output of Salsa20.

It took me longer than I care to admit to realize that v3 is not initialized with the entire first argument, but simply repeats the first four bytes (FLAR). I ended up re-implementing the Salsa20 permutation in Python and using Unicorn to verify that it resulted in the same output before realizing what was going on. So far so good, though: we can compute this value based on arg1.

The resulting byte sequence is:

026124f56d840c78fafa18a3b91c245fb91c245f026124f56d840c78fafa18a3
fafa18a3b91c245f026124f56d840c786d840c78fafa18a3b91c245f026124f5

The function sub_421 is where the second argument enters the mix.

void __fastcall sub_421(__int64 *a1, __int64 *a2, int *a3)
{
  __int64 v4; // [rsp+18h] [rbp-10h]
  int i; // [rsp+24h] [rbp-4h]

  for ( i = 7; i >= 0; --i )
  {
    v4 = *a1;
    *a1 ^= sub_3D1(*a2, i, (__int64 *)a3);
    *a2 = v4;
  }
}

This function appears to be some kind of XOR-loop; sub_3D1 computes *a2 ^ a3[i]. Note that a3 is the output of Salsa permutation, and a1 starts off pointing at arg2.

Recall that we want the resulting output of the loop in sub_4AF to be equal to arg1 in order to pass the check in sub_A62. That means we will have to invert the XOR-loop to work towards the value of arg2.

In an attempt to better understand what is going on here, I re-implement sub_421 in Python as follows. Using Unicorn, we can again verify that it matches the original code.

def xorloop(x, y, z):
    zz = [int.from_bytes(z[i : i + 8], byteorder="little") for i in range(0, 64, 8)]
    x = int.from_bytes(x[:8], byteorder="little")
    y = int.from_bytes(y[:8], byteorder="little")

    for i in range(7, -1, -1):
        t = x
        x ^= y ^ zz[i]
        y = t

    return (
        int.to_bytes(x, length=8, byteorder="little"),
        int.to_bytes(y, length=8, byteorder="little"),
        z,
    )

This is where it gets ugly. As I did not immediately see a way to cleanly invert the loop, I decided to unroll it. We can rely on the XOR operation to cancel terms along the way. The general expression is

\begin{align} x_{i} &= x_{i-1} \oplus y_{i-1} \oplus z_{7-i} \\ y_{i} &= x_{i-1} \\ \end{align}

… which unrolls to …

\begin{align} x_0 &= x\\ y_0 &= y\\ \\ x_1 &= x_0 \oplus y_0 \oplus z_7 \\ &= x \oplus y \oplus z_7 \\ y_1 &= x_0 = x\\ \\ x_2 &= x_1 \oplus y_1 \oplus z_6 \\ &= (x \oplus y \oplus z_7) \oplus x \oplus z_6\\ &= y \oplus z_7 \oplus z_6 \\ y_2 &= x_1 = x \oplus y \oplus z_7\\ \\ x_3 &= x_2 \oplus y_2 \oplus z_5 \\ &= (y \oplus z_7 \oplus z_6) \oplus (x \oplus y \oplus z_7) \oplus z_5\\ &= x \oplus z_6 \oplus z_5 \\ y_3 &= x_2 = y \oplus z_7 \oplus z_6 \\ \\ x_4 &= x_3 \oplus y_3 \oplus z_4 \\ &= (x \oplus z_6 \oplus z_5) \oplus (y \oplus z_7 \oplus z_6) \oplus z_4\\ &= x \oplus y \oplus z_7 \oplus z_5 \oplus z_4 \\ y_4 &= x_3 = x \oplus z_6 \oplus z_5\\ \\ x_5 &= x_4 \oplus y_4 \oplus z_3\\ &= (x \oplus y \oplus z_7 \oplus z_5 \oplus z_4) \oplus (x \oplus z_6 \oplus z_5) \oplus z_3\\ &= y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3 \\ y_5 &= x_4 = x \oplus y \oplus z_7 \oplus z_5 \oplus z_4\\ \\ x_6 &= x_5 \oplus y_5 \oplus z_2\\ &= (y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3) \oplus (x \oplus y \oplus z_7 \oplus z_5 \oplus z_4) \oplus z_2\\ &= x \oplus z_6 \oplus z_5 \oplus z_3 \oplus z_2 \\ y_6 &= x_5 = y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3\\ \\ x_7 &= x_6 \oplus y_6 \oplus z_1\\ &= (x \oplus z_6 \oplus z_5 \oplus z_3 \oplus z_2) \oplus (y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3) \oplus z_1\\ &= x \oplus y \oplus z_7 \oplus z_5 \oplus z_4 \oplus z_2 \oplus z_1 \\ y_7 &= x_6 = x \oplus z_6 \oplus z_5 \oplus z_3 \oplus z_2\\ \\ x_8 &= x_7 \oplus y_7 \oplus z_0\\ &= (x \oplus y \oplus z_7 \oplus z_5 \oplus z_4 \oplus z_2 \oplus z_1) \oplus (x \oplus z_6 \oplus z_5 \oplus z_3 \oplus z_2) \oplus z_0\\ &= y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3 \oplus z_1 \oplus z_0\\ y_8 &= x_7 = x \oplus y \oplus z_7 \oplus z_5 \oplus z_4 \oplus z_2 \oplus z_1\\ \end{align}

Thus, we find that the xor-loop effectively computes:

\begin{align} x_8 = y \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3 \oplus z_1 \oplus z_0\\ y_8 = x \oplus y \oplus z_7 \oplus z_5 \oplus z_4 \oplus z_2 \oplus z_1\\ \end{align}

By shuffling the XOR operations around, we can formulate the inverse operation, as well:

\begin{align} y = x_8 \oplus z_7 \oplus z_6 \oplus z_4 \oplus z_3 \oplus z_1 \oplus z_0\\ x = y \oplus y_8 \oplus z_7 \oplus z_5 \oplus z_4 \oplus z_2 \oplus z_1\\ \end{align}

Taking a look at sub_4AF again, we see that the separate DWORDs in the state variable are touched in pairs of two, and do not interleave.

for ( j = 0; j < v6; j += 2 )
  sub_421(&state[j], &state[j + 1], (__int64)salsa_output);

This means we can simply invert the order of the loop to invert its effect. Initializing the state to arg1 and using the Salsa20 output computed before, we derive the desired state before the XOR operations:

cc16294c151626f7fd3141f82ad718bfbb1d51550f723382
894e46e62eb76dbfe74364096f0821eb894d32996fe55de0

Base64-encoding this bytestream leads to the required arg2 input:

zBYpTBUWJvf9MUH4KtcYv7sdUVUPcjOCiU5G5i63bb/nQ2QJbwgh64lNMplv5V3g
Getting the flag

We’re doing this statically, so we cannot simply run the executable and supply the arguments to obtain the flag. Instead, we get the flag by computing the XOR of arg2 and the hardcoded value in the data section at address 0x1400144B0. We find c4n_i_sh1p_a_vm_as_an_exe_ask1ng_4_a_frnd

13: Y0da

TLDR: re-use the deobfuscator from challenge 5; point it at all thread entrypoints. Decompile. Use HashDB to identify API calls. Fix API calls by creating an enum. Find gimmie_advic3 and gimmie_s3cr3t commands. Identify MD5 hash aa321932ccdd8cce334a1c3354eed3b1. Google it. Find password patience_y0u_must_h4v3. Find fake flag . Cry. Obtain base32 strings. Fix string by making input to Mersenne Twister constant. Zero the output of the RNG entirely. Decode base32 using custom alphabet. .. ROP chain? What ROP chain?

So close to the end you can almost taste it. Keep going champ, complete this challenge and take your place as one of the Reverse Engineers of All-Time.