A Brief Comment

This post is being made because I’m rather annoyed. On top of search engines being ruined by the AI gold rush, Microsoft has wiped away easy access to some documentation on critical data structures in PE file loading. Only some critical data structures remain easily accessed in Microsoft’s documentation. What makes this especially annoying is that the critical data structures are still in the <windows.h> header, and I’m not particularly a fan of leaving open a copy of windows.h open to find my data structures. It’s a big file! And the structs you need are scattered all over the place!

The restriction of this information is why I call this forbidden content. You just get the feeling Microsoft no longer wants you to know about this process unless you’re a compiler developer. Either that or it’s just the victim of an enormous megacorporation restructuring and neglecting its documentary content. I like the romanticized version. It’s more motivating.

As a result, this post will guide you through a forbidden process: loading a PE file without the use of CreateProcess or LoadLibrary. You don’t really need this– only in specific cases. But hackers love these specific cases, so let’s get to documenting! This article requires knowledge of the C programming language to make sense.

The PE Format

When starting out with the portable executable format, you may be overwhelmed. The data structures involved are very detailed and they have a lot of content. Luckily for you a lot of that content has been made irrelevant by the natural progression of technology. But Microsoft is a dinosaur, sticking to what they’ve developed in the late 80s and early 90s in the DOS operating system and never letting go. Like some sort of ancient text that won’t go away, DOS lives on in all of our Windows executables.

A PE file is split up into a few critical sections. They are:

  • The DOS header
  • The NT header
  • The section table
  • The section data

The DOS header did not survive The Great PE Wiping by Microsoft, so here is a forbidden reference link. To survive Internet bit rot, here is what the header looks like in <windows.h>:

typedef struct _IMAGE_DOS_HEADER
{
     WORD e_magic;
     WORD e_cblp;
     WORD e_cp;
     WORD e_crlc;
     WORD e_cparhdr;
     WORD e_minalloc;
     WORD e_maxalloc;
     WORD e_ss;
     WORD e_sp;
     WORD e_csum;
     WORD e_ip;
     WORD e_cs;
     WORD e_lfarlc;
     WORD e_ovno;
     WORD e_res[4];
     WORD e_oemid;
     WORD e_oeminfo;
     WORD e_res2[10];
     LONG e_lfanew;
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

e_magic is why your PE file starts with the letters “MZ,” which stands for Mark Zbikowski, the developer of the DOS executable format. Most PE files that haven’t been tampered with come with a DOS executable stub to tell you not to run this in DOS. Coding creatives love to fuck with this section in particular.

Since we’re not currently concerned with ancient runes, the only other relevant data point in this structure is e_lfanew, which is an offset to our NT headers. This survived the brainwipe by Microsoft, and its 32-bit and 64-bit companions can be found here (for 32) and here (for 64). Because corporations can’t be trusted to preserve knowledge apparently, here they are:

typedef struct _IMAGE_NT_HEADERS {
  DWORD                   Signature;
  IMAGE_FILE_HEADER       FileHeader;
  IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
typedef struct _IMAGE_NT_HEADERS64 {
  DWORD                   Signature;
  IMAGE_FILE_HEADER       FileHeader;
  IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

Like the DOS header, there’s still some data within this modern header that’s not really relevant. Let’s start with the file header:

typedef struct _IMAGE_FILE_HEADER {
  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

The Machine field tells you which machine this is running on. The ones that are most often relevant to us are the constants IMAGE_FILE_MACHINE_I386 for x86 and IMAGE_FILE_MACHINE_AMD64 for x64. NumberOfSections controls how many data sections there are within the executable. TimeDateStamp is mostly for the compiler to document when this binary was compiled. PointerToSymbolTable and NumberOfSymbols I believe is only relevant if you’re dealing with old COFF objects. SizeOfOptionalHeader ultimately determines the offset to the section table if it is present. Characteristics are a series of bitflags that are mostly irrelevant except for a few entries (like IMAGE_FILE_DLL and IMAGE_FILE_EXECUTABLE_IMAGE), but thankfully our corporate overlords have blessed us with the knowledge of the ancient irrelevant information in case we ever encounter it. But you can’t know about the DOS header for some reason. Very inconsistent.

Let’s talk about the optional header, which isn’t “optional” in the sense of “you can choose to discard it,” but instead in the sense of “goddamn there are a lot of options here.” You can resize it according to SizeOfOptionalHeader, but you can’t get rid of it. Here it is:

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
typedef struct _IMAGE_OPTIONAL_HEADER64 {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  ULONGLONG            ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  ULONGLONG            SizeOfStackReserve;
  ULONGLONG            SizeOfStackCommit;
  ULONGLONG            SizeOfHeapReserve;
  ULONGLONG            SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

This struct is rather large, so I’ll let Microsoft’s documentation speak for the fields to focus on the relevant bits to loading:

  • the entrypoint of the binary (AddressOfEntryPoint)
  • the image base address (ImageBase)
  • the image size (SizeOfImage)
  • DllCharacteristics (specifically for the bitflag IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE)
  • the import directory (DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT])
  • the relocation directory (DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC])
  • the TLS directory (DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS])

In addition to these critical fields of our header, we have a section header, which defines the various data sections of the PE file. Microsoft documentation can be found here:

typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

At this point it should be explained how Windows defines the memory layout of the PE file, as you may have noticed in your analysis of binaries, the image of the executable on disk does not contain the same memory layout as the executable loaded into memory. As a result, you’re dealing with two offset types: disk and memory, the latter of which is referred to as RVAs (relative virtual address). A disk offset is exactly how it sounds– a location in the PE file as it exists on disk. Similarly, an RVA is a memory offset. Thus, VirtualAddress is an RVA to the section location in memory, and PointerToRawData is an offset to the section location on disk. VirtualSize and SizeOfRawData represent the size of the section in memory and on disk respectively.

The Characteristics of a section define various traits to the Windows loader as to how it should allocate this section in memory. For example, some sections can be defined executable (IMAGE_SCN_MEM_EXECUTE), writable (IMAGE_SCN_MEM_WRITE) or readable (IMAGE_SCN_MEM_READ), among many other traits to tell the Windows loader to set for the section. For our purposes, though, these protection characteristics aren’t really relevant. We’re going to allocate a page with read, write and execute privileges to cover our bases.

With all these basics in mind, we’re ready to dig into the details of writing the loader.

Preparing the Image

Getting the loader prepared is incredibly simple. First, we need to create some functions to get us to some critical data. Follow along with the full code example here on GitHub, as all the following code snippets will come from this repository. Let’s get those helper functions:

PIMAGE_NT_HEADERS64 get_nt_headers(const uint8_t *image_base) {
   PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base;
   return (PIMAGE_NT_HEADERS64)&image_base[dos_header->e_lfanew];
}

PIMAGE_SECTION_HEADER get_section_table(const uint8_t *image_base) {
   PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base;
   PIMAGE_NT_HEADERS64 nt_headers = get_nt_headers(image_base);
   size_t section_offset = dos_header->e_lfanew + sizeof(DWORD) + sizeof(IMAGE_FILE_HEADER) + nt_headers->FileHeader.SizeOfOptionalHeader;
   return (PIMAGE_SECTION_HEADER)&image_base[section_offset];
}

You can easily do these inline if you like, but writing these out just makes things a little cleaner. In get_nt_headers, we use IMAGE_DOS_HEADER’s e_lfanew offset to determine the pointer to the NT headers. You might think getting the section table is as straightforward as getting the pointer after the optional header, but that’s not how it’s calculated. Instead, we calculate the offset to the optional header based on preceding headers (dos_header->e_lfanew + sizeof(DWORD) + sizeof(IMAGE_FILE_HEADER)) then add the size of the optional header from the file header (nt_headers->FileHeader.SizeOfOptionalHeader). You’re probably thinking “fuck off, that’s the exact same location.” Only for standard PE files! It might be important for some of us to note the edgecases of how our binaries can be parsed! This is forbidden knowledge, after all.

Do whatever magic incantations you need to do to get the disk image of the executable of your desires into memory. Once you’ve got a disk buffer– I prefer to store my buffer in a uint8_t pointer– we can prepare the virtual buffer.

   /* get the nt headers */
   PIMAGE_NT_HEADERS64 disk_headers = get_nt_headers(disk_buffer);
   uint8_t *valloc_buffer;
   
   /* valloc a buffer of OptionalHeader.ImageSize
   /* if IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE is not set, attempt allocation with the image base */
   if ((disk_headers->OptionalHeader.DllCharacteristics & IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE) == 0)
      valloc_buffer = (uint8_t *)VirtualAlloc((LPVOID)disk_headers->OptionalHeader.ImageBase,
                                              disk_headers->OptionalHeader.SizeOfImage,
                                              MEM_COMMIT,
                                              PAGE_EXECUTE_READWRITE);
   else
      valloc_buffer = (uint8_t *)VirtualAlloc(0,
                                              disk_headers->OptionalHeader.SizeOfImage,
                                              MEM_COMMIT,
                                              PAGE_EXECUTE_READWRITE);

Do not allocate an image with malloc. Do not even get clever and VirtualProtect the page it’s in after you allocate it in malloc, you’re just changing the heap’s execution privileges and that’s incredibly silly. VirtualAlloc is the way to create an executable buffer. See the documentation for this function in particular.

Let’s talk about OptionalHeader.ImageBase. Back before everyone and their grandmother knew what a buffer overflow was, binaries were compiled with predetermined image bases, the most notorious one for Windows being 0x400000. But being the dinosaur that Microsoft is, you can still set your binary not to have a dynamic base, the switch which determines this being IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE. If this flag is not set, we tell the virtual allocator to use a predetermined image base. Otherwise, we take what RNGesus gives us.

Next we need to copy the data in. With a firm understanding of the offset types between the two types of images, we can easily transfer the data from the disk image to the memory image.

   /* copy the image into the valloc buffer */
   memcpy(valloc_buffer, disk_buffer, disk_headers->OptionalHeader.SizeOfHeaders);

   PIMAGE_SECTION_HEADER section_table = get_section_table(disk_buffer);

   for (size_t i=0; i<disk_headers->FileHeader.NumberOfSections; ++i)
      memcpy(&valloc_buffer[section_table[i].VirtualAddress],
             &disk_buffer[section_table[i].PointerToRawData],
             section_table[i].SizeOfRawData);

Unless you’re expecting to deal with malformed data, this will translate our disk binary to a memory image. We first copy in our header data, as some functions rely on parsing the headers of our binary to access certain information within the image. We then iterate over the section table, copying into the RVA offset of our virtually allocated buffer and sourcing from the disk offset of our allocated disk image.

Recall the data directories from the optional header we discussed earlier:

  • DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT]
  • DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC]
  • DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS]

We’ll cover these now in the order they need to be loaded, as each section has detailed things which need to be done.

The Relocation Directory

Relocatable binaries predate the existence of thwarting evil hackers. As a result, the tech to relocate a binary is a little confusing at first. You typically don’t encounter strong use of a 16-bit number unless you’re dealing with sockets. Let’s take a peek at what the relocation directory looks like.

typedef struct _IMAGE_BASE_RELOCATION {
  DWORD   VirtualAddress;
  DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;

Data directories, if present, have an RVA which points to some particular section of the binary, and can be acquired like so:

   PIMAGE_NT_HEADERS64 valloc_headers = get_nt_headers(valloc_buffer);
   DWORD reloc_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress;

   /* if the image has a relocation directory, use it */
   if (reloc_rva != 0) {
      uintptr_t base_delta = (uintptr_t)valloc_buffer - valloc_headers->OptionalHeader.ImageBase;
      uint8_t *base_reloc = &valloc_buffer[reloc_rva];

You might be wondering why I didn’t cast base_reloc as a IMAGE_BASE_RELOCATION pointer. That’s because the relocation data structure is a little ridiculous. See, the header I showed you isn’t the full story. In reality, the relocation data structure looks something more like this:

struct base_relocation {
    DWORD VirtualAddress;
    DWORD SizeOfBlock;
    WORD BlockData[(SizeOfBlock-sizeof(DWORD)-sizeof(DWORD))/sizeof(WORD)]
};

Effectively, the relocation directory is an array of these particular data structures terminated by a null VirtualAddress value. The relocation data, represented by BlockData in this augmented structure, adds to the confusion by splitting the relocation offset into two bitfields: the additional offset to the VirtualAddress field as well as the relocation type, because good lord were relocations originally complicated. For the most part, the only relocation types we’re concerned with about these days are IMAGE_REL_BASED_DIR64 and IMAGE_REL_BASED_HIGHLOW. These are basically fancy ways of saying “add the delta between the two image bases to the pointed value.” The progression of technology eventually lead to sanity but we still insanely cling to the past with the dinosaurs of DOS.

Let’s break down the relocation loop:

      while (((PIMAGE_BASE_RELOCATION)base_reloc)->VirtualAddress != 0) {
         PIMAGE_BASE_RELOCATION base_reloc_block = (PIMAGE_BASE_RELOCATION)base_reloc;
         WORD *entry_table = (WORD *)&base_reloc[sizeof(PIMAGE_BASE_RELOCATION)];
         size_t entries = (base_reloc_block->SizeOfBlock-sizeof(PIMAGE_BASE_RELOCATION))/sizeof(WORD);

We start by looping on whether or not the VirtualAddress entry for the given block is 0. We then acquire a pointer to the relocation array for this block, calculating the size of the array by subtracting the size of the relocation header from the block size and dividing it by the size of a word value. Do you see how annoying this is already? You had to do ridiculous shit in C in the DOS days. Just imagine what it was like before then! Absolute madness!

Anyway the word value is a pair of bitfields. The lower 12 bits– acquired by masking with 0xFFF– represent the additional value to add to the original block’s VirtualAddress value. The upper 4 bits– masked with 0xF000 and shifted 12 to the right– represent the relocation type. There are many ancient relics in this type value, but unless you have an autistic tick, you should only concern yourself with IMAGE_REL_BASED_HIGHLOW for 32-bit and IMAGE_REL_BASED_DIR64 for 64-bit.

         for (size_t i=0; i<entries; ++i) {
            DWORD reloc_rva = base_reloc_block->VirtualAddress + (entry_table[i] & 0xFFF);
            uintptr_t *reloc_ptr = (uintptr_t *)&valloc_buffer[reloc_rva];
               
            if ((entry_table[i] >> 12) == IMAGE_REL_BASED_DIR64)
               *reloc_ptr += base_delta;
         }
            
         base_reloc += base_reloc_block->SizeOfBlock;

We are ultimately calculating an RVA value to the target address which needs to be adjusted. We use this RVA value to acquire a pointer to the virtually allocated buffer and apply the delta between our target base and the prior image base in the binary. The last line is why I cast the initial block entry to a uint8_t– it becomes much easier to iterate over the blocks by simply adding the size of the whole block to a byte array.

Ultimately, the only thing that makes the relocation directory hard is that its structure is awkward and its elderly beard is unweildy. Let’s move on to the next data directory.

The Import Directory

The import directory is another null-terminated array with the following structure, banished from the MSDN because Satya Nadella doesn’t want you to know this black magic:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;

What the fuck is a thunk? I still don’t get what that is as far as a word is concerned, and as I mentioned earlier, search engines are nothing but AI slop, so trying to hunt down what a “thunk” is leads to Not What I’m Looking For. Anyway, a “thunk” in this context refers to an import entry within the import row. Name is an RVA pointing to a string representing the DLL to load with LoadLibrary, and OriginalFirstThunk and FirstThunk are RVAs pointing to null-terminated arrays containing import information. This import information comes in two forms:

  • An index ordinal
  • An IMAGE_IMPORT_BY_NAME structure

If the value in the thunk array’s top bit is set, discoverd by masking with 0x8000000000000000, it’s an import by ordinal. The ordinal is a 16-bit value index value of the target function into the export table. Look at all those wasted bits. This is what dealing with dinosaurs gets you.

The other entry is an RVA value pointing to an IMAGE_IMPORT_BY_NAME structure, which is forbidden information:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    CHAR   Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

The Hint value represents the ordinal index for this function, and the Name is what you pass to GetProcAddress to get the function pointer. When you resolve this function, you stick it in the FirstThunk array. This array is referred to as the “import address table,” or “IAT.”

With all that explained, the following code should make plenty of sense:

   /* resolve the import table */
   DWORD import_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

   if (import_rva != 0) {
      PIMAGE_IMPORT_DESCRIPTOR import_table = (PIMAGE_IMPORT_DESCRIPTOR)&valloc_buffer[import_rva];

      while (import_table->OriginalFirstThunk != 0) {
         HMODULE module = LoadLibraryA((const char *)&valloc_buffer[import_table->Name]);
         uintptr_t *original_thunks = (uintptr_t *)&valloc_buffer[import_table->OriginalFirstThunk];
         uintptr_t *import_addrs = (uintptr_t *)&valloc_buffer[import_table->FirstThunk];

         while (*original_thunks != 0) {
            if (*original_thunks & 0x8000000000000000)
               *import_addrs = (uintptr_t)GetProcAddress(module, MAKEINTRESOURCE(*original_thunks & 0xFFFF));
            else {
               PIMAGE_IMPORT_BY_NAME import_by_name = (PIMAGE_IMPORT_BY_NAME)&valloc_buffer[*original_thunks];
               *import_addrs = (uintptr_t)GetProcAddress(module, import_by_name->Name);
            }

            ++import_addrs;
            ++original_thunks;
         }

         ++import_table;
      }
   }

Let’s move onto the ultimate reason why I went and wrote this article: the TLS directory.

The TLS Directory

I am very annoyed that the documentation for IMAGE_TLS_DIRECTORY64 is removed, because there’s still some parts of it that don’t make sense to me without reference. Ultimately the bits I don’t understand don’t particularly matter, but I’m nowhere near neurotypical, so this lack of information drives me up the wall. Anyway, here is the forbidden data structure:

typedef VOID
(NTAPI *PIMAGE_TLS_CALLBACK) (
    PVOID DllHandle,
    DWORD Reason,
    PVOID Reserved
    );

typedef struct _IMAGE_TLS_DIRECTORY64 {
    ULONGLONG StartAddressOfRawData;
    ULONGLONG EndAddressOfRawData;
    ULONGLONG AddressOfIndex;         // PDWORD
    ULONGLONG AddressOfCallBacks;     // PIMAGE_TLS_CALLBACK *;
    DWORD SizeOfZeroFill;
    union {
        DWORD Characteristics;
        struct {
            DWORD Reserved0 : 20;
            DWORD Alignment : 4;
            DWORD Reserved1 : 8;
        } DUMMYSTRUCTNAME;
    } DUMMYUNIONNAME;

} IMAGE_TLS_DIRECTORY64;

typedef IMAGE_TLS_DIRECTORY64 * PIMAGE_TLS_DIRECTORY64;

typedef struct _IMAGE_TLS_DIRECTORY32 {
    DWORD   StartAddressOfRawData;
    DWORD   EndAddressOfRawData;
    DWORD   AddressOfIndex;             // PDWORD
    DWORD   AddressOfCallBacks;         // PIMAGE_TLS_CALLBACK *
    DWORD   SizeOfZeroFill;
    union {
        DWORD Characteristics;
        struct {
            DWORD Reserved0 : 20;
            DWORD Alignment : 4;
            DWORD Reserved1 : 8;
        } DUMMYSTRUCTNAME;
    } DUMMYUNIONNAME;

} IMAGE_TLS_DIRECTORY32;
typedef IMAGE_TLS_DIRECTORY32 * PIMAGE_TLS_DIRECTORY32;

I had to pull these directly out of <windows.h> because apparently no one really documents this little relic of the PE format. This is what’s called the Thread Local Storage directory. It creates local memory storage for threads, and as a result, is part of the loading process for the PE image. AddressOfCallBacks is a null-terminated array of pointers to functions. If you’ve ever dealt with malware before, you are no doubt aware that this directory runs before the main routine. And now you might be thinking “hey neat, a function that gets called before main!” It is neat, but inside the callback, your binary is sitting in an uninitialized state because it’s still loading. Things just Don’t Work because they’re not initialized, like C runtime functions.

Either way, dealing with this directory is pretty straight-forward: iterate over AddressOfCallBacks until you hit a null byte, dereference the pointer and call the function.

   /* initialize the tls callbacks */
   DWORD tls_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].VirtualAddress;

   if (tls_rva != 0) {
      PIMAGE_TLS_DIRECTORY64 tls_dir = (PIMAGE_TLS_DIRECTORY64)&valloc_buffer[tls_rva];
      void (**callbacks)(PVOID, DWORD, PVOID) = (void (**)(PVOID, DWORD, PVOID))tls_dir->AddressOfCallBacks;

      while (*callbacks != NULL) {
         (*callbacks)(valloc_buffer, DLL_PROCESS_ATTACH, NULL);
         ++callbacks;
      }
   }

The reasons you can pass to it on load can be found in the documentation for DllMain.

Detonating the Payload

Now that we have everything settled, we can call the entrypoint of our binary! Unfortunately for us since we’re loading reflectively, we’ll be a little off from what we expect. For example, if your target binary is an executable and not a DLL, your entrypoint isn’t necessarily the same as main or even WinMain. It is, however, DllMain for DLLs. So all you have to do is check if IMAGE_FILE_DLL is set in FileHeader.Characteristics then call the appropriate entrypoint.

   /* call the entrypoint */
   if ((valloc_headers->FileHeader.Characteristics & IMAGE_FILE_DLL) != 0) {
      BOOL (WINAPI *dll_main)(HINSTANCE, DWORD, LPVOID) = (BOOL (*)(HINSTANCE, DWORD, LPVOID))&valloc_buffer[valloc_headers->OptionalHeader.AddressOfEntryPoint];
      dll_main((HINSTANCE)valloc_buffer, DLL_PROCESS_ATTACH, NULL);
   }
   else {
      int (*main)(PVOID) = (int (*)(PVOID))&valloc_buffer[valloc_headers->OptionalHeader.AddressOfEntryPoint];
      main(valloc_buffer);
   }

And that’s it! You’ve successfully bitten the forbidden fruit and loaded an executable in an unsanctioned way! May you use this wisdom however you see fit.

Corrections

In a previous article, I incorrectly said one of the import types is a forwarder string. It is not– this is actually part of the export directory, not the import directory. Do you see what Microsoft vaporizing their documentation does? Half the reason I’m making an effort to document things.