Binary Golf Grand Prix is a yearly mad dash to create a tiny binary, given some target parameters. This year, the theme was download: write a program to download and print the contents of a target URL.

Planning

I had literally never tried golfing a binary before, so I didn’t really know where to start. My bread and butter is Windows executables, so I figured I would attempt one of those. Naturally, since we’re golfing, I looked toward the Corkami corpus for guidance. But this didn’t satisfy– it was 32-bit. I know all about 32-bit binaries and they’re no longer modern! We needed to do better.

Corkami also has a 64-bit binary to work with, but it was pretty old– for some reason it only worked on Windows 7. After essentially slamming my dick in a cardoor wondering why I kept generating invalid binaries with my stub ASM file, I actually tried reading this assembly file and realized: the file needs to be, at least, 268 bytes!

Imagine reading, couldn’t be me. Anyway, we padded our initial binary with the proper amount of bytes after our test code and compiled our file with NASM: success! Now we could start golfing the program!

Golfing x86-64

Naturally you would want to lean on your shellcoding experience to do this, so I attempted to create some shellcode: find URLDownloadToFileA and use MSVCRT to import some functions for opening files, reading files and printing a message. To save space, we skipped importing by hash and borrowed a trick from netspooky of searching for a unique string offset in the imported function name. He did this in BGGP2!

x86-64 absolutely hates being golfed. It takes up an absolutely unreasonable amount of bytes to access the TEB for example. Glancing at the code it seemed like the direct addresses being used could be a good subject of golfing. Calling an address? Fuck you, that’s a 5-byte cost. Calling a register? Well that’s easy, it’s just two bytes! If you’re calling a function frequently, you can save some bytes by doing this:

lea rsi, [your_frequent_function]       ; 8d34 2508 0000 00
call rsi                                ; ffd6

You will notice that direct addressing is your enemy. If we’re frequently accessing memory directly, we can actually save a byte per access by loading up the base offset of the program and calculating our memory location that way. Compare:

lea rcx, [label]                        ; 488d 0425 0800 4000

Versus:

lea rcx, [rbx+(label-0x400000)]         ; 488d 8307 0000 00

But these gains are garbage! The binary I was competing with at the time was 399 bytes in size on x86, and I was sitting at 520 on x64! Surely this was a skill issue. We looked for inspiration from the other competitors.

While reading a write-up, a lightbulb went off: Windows has curl? We could save tons of space on the import code of msvcrt and URLDownloadToFileA by calling system("curl -L ...")! We wouldn’t have to read the file from URLDownloadToFileA! So we stuffed our function-hunting DWORDs in the unused headers of the PE file and did just that. Now these are the gains I wanted! We golfed our way down to 345 bytes– smaller than the binary I was competing against!

As I learened from viewing Peter Ferrie’s entry, who utterly mogged me, there was tons more I could golf, but for the sake of not cheating more than I already had, I chose to stop here and submit my code. Nothing fancy but a good first attempt!

Conclusions

This was a good first golf for me, but this is not the best you can do– Peter Ferrie, aka qkumba, absolutely swept the floor with his submissions on x86 and x64, with enough space to spare to sign his name in the binary! Take a look at his submissions! In the meantime, you can look at the asm that produced my file. I suggest reading Peter Ferrie’s writeups as well.