Reversed a custom cipher validation algorithm from a stripped RISC-V ELF binary to recover a lost activation code.
Initial triage with file:
With a stripped binary the first move is always strings to find program flow anchors. Grepping for auth/access related strings surfaced three interesting entries in .rodata:
Classic three-path validation: prompt → fail branch → success branch. Cross-referencing these addresses in the disassembly (riscv64-linux-gnu-objdump -d) pointed directly to the validation function at 0x10466.
The validation loop processes exactly 64 bytes of input. Each byte goes through three transforms before being compared to a hardcoded expected table at 0x55ed8. Let's walk through each layer.
i is computed as key[i] = (55 + 31·i) mod 256. This is a linear congruential keystream — starts at 0x37 and increments by 31 per byte. Since 31 and 256 are coprime the sequence has full period 256, cycling through every byte value before repeating. In the RISC-V assembly this looks like a repeated addiw a1, a1, 31 / zext.b a1, a1 pattern.
(i mod 7) + 1 bits — so the rotation cycles 1→2→3→4→5→6→7→1→... The mod 7 is computed using a mulhu magic constant trick (magic = 0x2492492492492493), the standard branchless division-by-7 on RISC-V. There's no divide instruction in the base ISA, so compilers emit this multiply-high pattern. The rotation itself is split into sllw / srlw / or with a negated shift amount.
a0) initialised to 0xAA. The accumulator value after each update is what gets compared to expected[i] — not the rotated byte directly. This chains all 64 bytes together: a single wrong input byte corrupts all subsequent comparisons.
Key disassembly — inner loop:
10532: lbu a4, 0(a0) # a4 = input_byte[i]
10536: xor a4, a4, a1 # a4 ^= key[i] (layer 1)
10538: mulhu a3, a2, t1 # mulhu trick for i/7
1053c: sub a5, a2, a3
10540: srli a5, a5, 1
10544: add a3, a3, a5
10548: srli a3, a3, 2 # a3 = i // 7
1054c: slli a5, a3, 3
10550: sub a5, a5, a3 # a5 = a3 * 7
10554: sub a5, a2, a5 # a5 = i mod 7
10558: addiw a5, a5, 1 # rot = (i mod 7) + 1
1055c: sllw a3, a4, a5 # left part of rotl8
10560: negw a5, a5
10564: andi a5, a5, 7 # (8 - rot) & 7
10568: srlw a4, a4, a5 # right part
1056c: or a5, a3, a4 # a5 = rotl8(byte ^ key, rot)
10570: xor a0, a0, a5 # acc ^= rotated (layer 3)
10574: lbu a5, 0(a6) # expected[i]
10578: bne a5, a0, fail # compare vs acc
Since we have the expected table (64 bytes extracted from file offset 0x45ed8), we can invert each layer in reverse order, working through the chain sequentially.
The forward relation per byte i is:
# Forward
acc[i] = acc[i-1] XOR rotl8(input[i] XOR key[i], rot[i])
# And: expected[i] == acc[i]
# Therefore:
expected[i] XOR acc[i-1] == rotl8(input[i] XOR key[i], rot[i])
Applying rotr8 to both sides, then XORing with the key, gives us input[i] directly:
def rotl8(val, n):
n = n & 7
return ((val << n) | (val >> (8 - n))) & 0xff
def rotr8(val, n):
return rotl8(val, 8 - n)
expected = bytes.fromhex("5e929ac5db7aa177...") # 64 bytes from 0x55ed8
acc = 0xaa
code = []
for i in range(64):
key_i = (55 + 31 * i) & 0xff
rot_i = (i % 7) + 1
delta = expected[i] ^ acc # undo accumulator XOR
inp_x = rotr8(delta, rot_i) # undo rotation
code.append(inp_x ^ key_i) # undo XOR key
acc = expected[i] # advance accumulator
print(bytes(code).decode()) # MetaCTF{...}
A few RISC-V specifics that made the analysis interesting:
| Pattern | RISC-V Idiom | x86 Equivalent |
|---|---|---|
| Integer division | mulhu + shift sequence (no div in base ISA) | idiv or compiler magic multiply |
| Byte rotation | sllw + negw + andi 7 + srlw + or | rol/ror (single instruction) |
| Zero-extend byte | zext.b (pseudo) = andi reg, reg, 0xff | movzx |
| Large immediates | lui + addi pairs (no 64-bit imm loads) | mov rax, imm64 |
| RVC compression | 16-bit encodings interleaved with 32-bit | n/a |
The lack of a hardware divide instruction is the biggest hurdle for people new to RISC-V RE. The mulhu magic constant for dividing by 7 (0x2492492492492493) follows the standard Hacker's Delight formula — once you recognise it, the mod 7 pattern becomes obvious. The RVC compressed instructions also mean instruction boundaries aren't always 4-byte aligned, which can trip up naive linear disassembly.