Forbidden Content: Crudely Loading a PE File
A Brief Comment
This post is being made because I’m rather annoyed. On top of search engines being ruined
by the AI gold rush, Microsoft has wiped away easy access to some documentation on critical
data structures in PE file loading. Only some critical data structures remain easily
accessed in Microsoft’s documentation. What makes this especially annoying is that the
critical data structures are still in the <windows.h>
header, and I’m not particularly
a fan of leaving open a copy of windows.h open to find my data structures. It’s a big file!
And the structs you need are scattered all over the place!
The restriction of this information is why I call this forbidden content. You just get the feeling Microsoft no longer wants you to know about this process unless you’re a compiler developer. Either that or it’s just the victim of an enormous megacorporation restructuring and neglecting its documentary content. I like the romanticized version. It’s more motivating.
As a result, this post will guide you through a forbidden process: loading a PE file without
the use of CreateProcess
or LoadLibrary
. You don’t really need this– only in specific cases.
But hackers love these specific cases, so let’s get to documenting! This article requires knowledge
of the C programming language to make sense.
The PE Format
When starting out with the portable executable format, you may be overwhelmed. The data structures involved are very detailed and they have a lot of content. Luckily for you a lot of that content has been made irrelevant by the natural progression of technology. But Microsoft is a dinosaur, sticking to what they’ve developed in the late 80s and early 90s in the DOS operating system and never letting go. Like some sort of ancient text that won’t go away, DOS lives on in all of our Windows executables.
A PE file is split up into a few critical sections. They are:
- The DOS header
- The NT header
- The section table
- The section data
The DOS header did not survive The Great PE Wiping by Microsoft, so here is a forbidden reference link.
To survive Internet bit rot, here is what the header looks like in <windows.h>
:
typedef struct _IMAGE_DOS_HEADER
{
WORD e_magic;
WORD e_cblp;
WORD e_cp;
WORD e_crlc;
WORD e_cparhdr;
WORD e_minalloc;
WORD e_maxalloc;
WORD e_ss;
WORD e_sp;
WORD e_csum;
WORD e_ip;
WORD e_cs;
WORD e_lfarlc;
WORD e_ovno;
WORD e_res[4];
WORD e_oemid;
WORD e_oeminfo;
WORD e_res2[10];
LONG e_lfanew;
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
e_magic
is why your PE file starts with the letters “MZ,” which stands for Mark Zbikowski,
the developer of the DOS executable format. Most PE files that haven’t been tampered with come with a DOS executable stub
to tell you not to run this in DOS. Coding creatives love to fuck with this section in particular.
Since we’re not currently concerned with ancient runes, the only other relevant data point in this
structure is e_lfanew
, which is an offset to our NT headers. This survived the brainwipe by Microsoft, and its 32-bit and 64-bit
companions can be found here (for 32)
and here (for 64). Because corporations
can’t be trusted to preserve knowledge apparently, here they are:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
typedef struct _IMAGE_NT_HEADERS64 {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;
Like the DOS header, there’s still some data within this modern header that’s not really relevant. Let’s start with the file header:
typedef struct _IMAGE_FILE_HEADER {
WORD Machine;
WORD NumberOfSections;
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
The Machine
field tells you which machine this is running on. The ones that are most often relevant to us are the constants
IMAGE_FILE_MACHINE_I386
for x86 and IMAGE_FILE_MACHINE_AMD64
for x64. NumberOfSections
controls how many data sections
there are within the executable. TimeDateStamp
is mostly for the compiler to document when this binary was compiled.
PointerToSymbolTable
and NumberOfSymbols
I believe is only relevant if you’re dealing with old COFF objects.
SizeOfOptionalHeader
ultimately determines the offset to the section table if it is present. Characteristics
are a series of
bitflags that are mostly irrelevant except for a few entries (like IMAGE_FILE_DLL
and IMAGE_FILE_EXECUTABLE_IMAGE
), but
thankfully our corporate overlords have blessed us with the knowledge of the ancient irrelevant information in case we ever
encounter it. But you can’t know about the DOS header for some reason. Very inconsistent.
Let’s talk about the optional header, which isn’t “optional” in the sense of “you can choose to discard it,” but instead in the
sense of “goddamn there are a lot of options here.” You can resize it according to SizeOfOptionalHeader
, but you can’t get rid
of it. Here it is:
typedef struct _IMAGE_OPTIONAL_HEADER {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
typedef struct _IMAGE_OPTIONAL_HEADER64 {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
ULONGLONG ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
ULONGLONG SizeOfStackReserve;
ULONGLONG SizeOfStackCommit;
ULONGLONG SizeOfHeapReserve;
ULONGLONG SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
This struct is rather large, so I’ll let Microsoft’s documentation speak for the fields to focus on the relevant bits to loading:
- the entrypoint of the binary (
AddressOfEntryPoint
) - the image base address (
ImageBase
) - the image size (
SizeOfImage
) DllCharacteristics
(specifically for the bitflagIMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE
)- the import directory (
DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT]
) - the relocation directory (
DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC]
) - the TLS directory (
DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS]
)
In addition to these critical fields of our header, we have a section header, which defines the various data sections of the PE file. Microsoft documentation can be found here:
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
At this point it should be explained how Windows defines the memory layout of the PE file, as you may have noticed in
your analysis of binaries, the image of the executable on disk does not contain the same memory layout as the executable loaded
into memory. As a result, you’re dealing with two offset types: disk and memory, the latter of which is referred to as RVAs
(relative virtual address). A disk offset is exactly how it sounds– a location in the PE file as it exists on disk. Similarly, an RVA is a
memory offset. Thus, VirtualAddress
is an RVA to the section location in memory, and PointerToRawData
is an offset to the
section location on disk. VirtualSize
and SizeOfRawData
represent the size of the section in memory and on disk respectively.
The Characteristics
of a section define various traits to the Windows loader as to how it should allocate this section in
memory. For example, some sections can be defined executable (IMAGE_SCN_MEM_EXECUTE
), writable (IMAGE_SCN_MEM_WRITE
) or
readable (IMAGE_SCN_MEM_READ
), among many other traits to tell the Windows loader to set for the section. For our purposes, though,
these protection characteristics aren’t really relevant. We’re going to allocate a page with read, write and execute privileges
to cover our bases.
With all these basics in mind, we’re ready to dig into the details of writing the loader.
Preparing the Image
Getting the loader prepared is incredibly simple. First, we need to create some functions to get us to some critical data. Follow along with the full code example here on GitHub, as all the following code snippets will come from this repository. Let’s get those helper functions:
PIMAGE_NT_HEADERS64 get_nt_headers(const uint8_t *image_base) {
PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base;
return (PIMAGE_NT_HEADERS64)&image_base[dos_header->e_lfanew];
}
PIMAGE_SECTION_HEADER get_section_table(const uint8_t *image_base) {
PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base;
PIMAGE_NT_HEADERS64 nt_headers = get_nt_headers(image_base);
size_t section_offset = dos_header->e_lfanew + sizeof(DWORD) + sizeof(IMAGE_FILE_HEADER) + nt_headers->FileHeader.SizeOfOptionalHeader;
return (PIMAGE_SECTION_HEADER)&image_base[section_offset];
}
You can easily do these inline if you like, but writing these out just makes things a little cleaner. In get_nt_headers
,
we use IMAGE_DOS_HEADER
’s e_lfanew
offset to determine the pointer to the NT headers. You might think getting the section
table is as straightforward as getting the pointer after the optional header, but that’s not how it’s calculated. Instead,
we calculate the offset to the optional header based on preceding headers (dos_header->e_lfanew + sizeof(DWORD) + sizeof(IMAGE_FILE_HEADER)
)
then add the size of the optional header from the file header (nt_headers->FileHeader.SizeOfOptionalHeader
). You’re probably
thinking “fuck off, that’s the exact same location.” Only for standard PE files! It might be important for some of us to note
the edgecases of how our binaries can be parsed! This is forbidden knowledge, after all.
Do whatever magic incantations you need to do to get the disk image of the executable of your desires into memory. Once you’ve got a disk
buffer– I prefer to store my buffer in a uint8_t
pointer– we can prepare the virtual buffer.
/* get the nt headers */
PIMAGE_NT_HEADERS64 disk_headers = get_nt_headers(disk_buffer);
uint8_t *valloc_buffer;
/* valloc a buffer of OptionalHeader.ImageSize
/* if IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE is not set, attempt allocation with the image base */
if ((disk_headers->OptionalHeader.DllCharacteristics & IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE) == 0)
valloc_buffer = (uint8_t *)VirtualAlloc((LPVOID)disk_headers->OptionalHeader.ImageBase,
disk_headers->OptionalHeader.SizeOfImage,
MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
else
valloc_buffer = (uint8_t *)VirtualAlloc(0,
disk_headers->OptionalHeader.SizeOfImage,
MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
Do not allocate an image with malloc
. Do not even get clever and VirtualProtect
the page it’s in after
you allocate it in malloc
, you’re just changing the heap’s execution privileges and that’s incredibly silly.
VirtualAlloc
is the way to create an executable buffer. See the documentation
for this function in particular.
Let’s talk about OptionalHeader.ImageBase
. Back before everyone and their grandmother knew what a buffer
overflow was, binaries were compiled with predetermined image bases, the most notorious one for Windows
being 0x400000
. But being the dinosaur that Microsoft is, you can still set your binary not to have a
dynamic base, the switch which determines this being IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE
. If this flag
is not set, we tell the virtual allocator to use a predetermined image base. Otherwise, we take what RNGesus
gives us.
Next we need to copy the data in. With a firm understanding of the offset types between the two types of images, we can easily transfer the data from the disk image to the memory image.
/* copy the image into the valloc buffer */
memcpy(valloc_buffer, disk_buffer, disk_headers->OptionalHeader.SizeOfHeaders);
PIMAGE_SECTION_HEADER section_table = get_section_table(disk_buffer);
for (size_t i=0; i<disk_headers->FileHeader.NumberOfSections; ++i)
memcpy(&valloc_buffer[section_table[i].VirtualAddress],
&disk_buffer[section_table[i].PointerToRawData],
section_table[i].SizeOfRawData);
Unless you’re expecting to deal with malformed data, this will translate our disk binary to a memory image. We first copy in our header data, as some functions rely on parsing the headers of our binary to access certain information within the image. We then iterate over the section table, copying into the RVA offset of our virtually allocated buffer and sourcing from the disk offset of our allocated disk image.
Recall the data directories from the optional header we discussed earlier:
DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT]
DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC]
DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS]
We’ll cover these now in the order they need to be loaded, as each section has detailed things which need to be done.
The Relocation Directory
Relocatable binaries predate the existence of thwarting evil hackers. As a result, the tech to relocate a binary is a little confusing at first. You typically don’t encounter strong use of a 16-bit number unless you’re dealing with sockets. Let’s take a peek at what the relocation directory looks like.
typedef struct _IMAGE_BASE_RELOCATION {
DWORD VirtualAddress;
DWORD SizeOfBlock;
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
Data directories, if present, have an RVA which points to some particular section of the binary, and can be acquired like so:
PIMAGE_NT_HEADERS64 valloc_headers = get_nt_headers(valloc_buffer);
DWORD reloc_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress;
/* if the image has a relocation directory, use it */
if (reloc_rva != 0) {
uintptr_t base_delta = (uintptr_t)valloc_buffer - valloc_headers->OptionalHeader.ImageBase;
uint8_t *base_reloc = &valloc_buffer[reloc_rva];
You might be wondering why I didn’t cast base_reloc
as a IMAGE_BASE_RELOCATION
pointer. That’s because the relocation
data structure is a little ridiculous. See, the header I showed you isn’t the full story. In reality, the relocation
data structure looks something more like this:
struct base_relocation {
DWORD VirtualAddress;
DWORD SizeOfBlock;
WORD BlockData[(SizeOfBlock-sizeof(DWORD)-sizeof(DWORD))/sizeof(WORD)]
};
Effectively, the relocation directory is an array of these particular data structures terminated by a
null VirtualAddress
value. The relocation data, represented by BlockData
in this augmented structure,
adds to the confusion by splitting the relocation offset into two bitfields: the additional offset
to the VirtualAddress
field as well as the relocation type, because good lord were relocations originally
complicated. For the most part, the only relocation types we’re concerned with about these days are
IMAGE_REL_BASED_DIR64
and IMAGE_REL_BASED_HIGHLOW
. These are basically fancy ways of saying “add the
delta between the two image bases to the pointed value.” The progression of technology eventually lead to
sanity but we still insanely cling to the past with the dinosaurs of DOS.
Let’s break down the relocation loop:
while (((PIMAGE_BASE_RELOCATION)base_reloc)->VirtualAddress != 0) {
PIMAGE_BASE_RELOCATION base_reloc_block = (PIMAGE_BASE_RELOCATION)base_reloc;
WORD *entry_table = (WORD *)&base_reloc[sizeof(PIMAGE_BASE_RELOCATION)];
size_t entries = (base_reloc_block->SizeOfBlock-sizeof(PIMAGE_BASE_RELOCATION))/sizeof(WORD);
We start by looping on whether or not the VirtualAddress
entry for the given block is 0. We then
acquire a pointer to the relocation array for this block, calculating the size of the array by subtracting
the size of the relocation header from the block size and dividing it by the size of a word value. Do you
see how annoying this is already? You had to do ridiculous shit in C in the DOS days. Just imagine what
it was like before then! Absolute madness!
Anyway the word value is a pair of bitfields. The lower 12 bits– acquired by masking with 0xFFF
– represent
the additional value to add to the original block’s VirtualAddress
value. The upper 4 bits– masked with 0xF000
and shifted 12 to the right– represent the relocation type. There are many ancient relics in this type value,
but unless you have an autistic tick, you should only concern yourself with IMAGE_REL_BASED_HIGHLOW
for 32-bit
and IMAGE_REL_BASED_DIR64
for 64-bit.
for (size_t i=0; i<entries; ++i) {
DWORD reloc_rva = base_reloc_block->VirtualAddress + (entry_table[i] & 0xFFF);
uintptr_t *reloc_ptr = (uintptr_t *)&valloc_buffer[reloc_rva];
if ((entry_table[i] >> 12) == IMAGE_REL_BASED_DIR64)
*reloc_ptr += base_delta;
}
base_reloc += base_reloc_block->SizeOfBlock;
We are ultimately calculating an RVA value to the target address which needs to be adjusted. We use this RVA value
to acquire a pointer to the virtually allocated buffer and apply the delta between our target base and the prior
image base in the binary. The last line is why I cast the initial block entry to a uint8_t
– it becomes much easier
to iterate over the blocks by simply adding the size of the whole block to a byte array.
Ultimately, the only thing that makes the relocation directory hard is that its structure is awkward and its elderly beard is unweildy. Let’s move on to the next data directory.
The Import Directory
The import directory is another null-terminated array with the following structure, banished from the MSDN because Satya Nadella doesn’t want you to know this black magic:
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
union {
DWORD Characteristics;
DWORD OriginalFirstThunk;
} DUMMYUNIONNAME;
DWORD TimeDateStamp;
DWORD ForwarderChain;
DWORD Name;
DWORD FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;
What the fuck is a thunk? I still don’t get what that is as far as a word is concerned, and as I mentioned earlier,
search engines are nothing but AI slop, so trying to hunt down what a “thunk” is leads to Not What I’m Looking For.
Anyway, a “thunk” in this context refers to an import entry within the import row. Name
is an RVA pointing to a
string representing the DLL to load with LoadLibrary
, and OriginalFirstThunk
and FirstThunk
are RVAs pointing
to null-terminated arrays containing import information. This import information comes in two forms:
- An index ordinal
- An
IMAGE_IMPORT_BY_NAME
structure
If the value in the thunk array’s top bit is set, discoverd by masking with 0x8000000000000000
, it’s an import by
ordinal. The ordinal is a 16-bit value index value of the target function into the export table. Look at all those
wasted bits. This is what dealing with dinosaurs gets you.
The other entry is an RVA value pointing to an IMAGE_IMPORT_BY_NAME
structure, which is forbidden information:
typedef struct _IMAGE_IMPORT_BY_NAME {
WORD Hint;
CHAR Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
The Hint
value represents the ordinal index for this function, and the Name
is what you pass to GetProcAddress
to get the function pointer. When you resolve this function, you stick it in the FirstThunk
array. This array is
referred to as the “import address table,” or “IAT.”
With all that explained, the following code should make plenty of sense:
/* resolve the import table */
DWORD import_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;
if (import_rva != 0) {
PIMAGE_IMPORT_DESCRIPTOR import_table = (PIMAGE_IMPORT_DESCRIPTOR)&valloc_buffer[import_rva];
while (import_table->OriginalFirstThunk != 0) {
HMODULE module = LoadLibraryA((const char *)&valloc_buffer[import_table->Name]);
uintptr_t *original_thunks = (uintptr_t *)&valloc_buffer[import_table->OriginalFirstThunk];
uintptr_t *import_addrs = (uintptr_t *)&valloc_buffer[import_table->FirstThunk];
while (*original_thunks != 0) {
if (*original_thunks & 0x8000000000000000)
*import_addrs = (uintptr_t)GetProcAddress(module, MAKEINTRESOURCE(*original_thunks & 0xFFFF));
else {
PIMAGE_IMPORT_BY_NAME import_by_name = (PIMAGE_IMPORT_BY_NAME)&valloc_buffer[*original_thunks];
*import_addrs = (uintptr_t)GetProcAddress(module, import_by_name->Name);
}
++import_addrs;
++original_thunks;
}
++import_table;
}
}
Let’s move onto the ultimate reason why I went and wrote this article: the TLS directory.
The TLS Directory
I am very annoyed that the documentation for IMAGE_TLS_DIRECTORY64
is removed, because there’s still
some parts of it that don’t make sense to me without reference. Ultimately the bits I don’t understand
don’t particularly matter, but I’m nowhere near neurotypical, so this lack of information drives me up the
wall. Anyway, here is the forbidden data structure:
typedef VOID
(NTAPI *PIMAGE_TLS_CALLBACK) (
PVOID DllHandle,
DWORD Reason,
PVOID Reserved
);
typedef struct _IMAGE_TLS_DIRECTORY64 {
ULONGLONG StartAddressOfRawData;
ULONGLONG EndAddressOfRawData;
ULONGLONG AddressOfIndex; // PDWORD
ULONGLONG AddressOfCallBacks; // PIMAGE_TLS_CALLBACK *;
DWORD SizeOfZeroFill;
union {
DWORD Characteristics;
struct {
DWORD Reserved0 : 20;
DWORD Alignment : 4;
DWORD Reserved1 : 8;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
} IMAGE_TLS_DIRECTORY64;
typedef IMAGE_TLS_DIRECTORY64 * PIMAGE_TLS_DIRECTORY64;
typedef struct _IMAGE_TLS_DIRECTORY32 {
DWORD StartAddressOfRawData;
DWORD EndAddressOfRawData;
DWORD AddressOfIndex; // PDWORD
DWORD AddressOfCallBacks; // PIMAGE_TLS_CALLBACK *
DWORD SizeOfZeroFill;
union {
DWORD Characteristics;
struct {
DWORD Reserved0 : 20;
DWORD Alignment : 4;
DWORD Reserved1 : 8;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
} IMAGE_TLS_DIRECTORY32;
typedef IMAGE_TLS_DIRECTORY32 * PIMAGE_TLS_DIRECTORY32;
I had to pull these directly out of <windows.h>
because apparently no one really documents
this little relic of the PE format. This is what’s called the Thread Local Storage directory.
It creates local memory storage for threads, and as a result, is part of the loading process
for the PE image. AddressOfCallBacks
is a null-terminated array of pointers to functions. If
you’ve ever dealt with malware before, you are no doubt aware that this directory runs before
the main routine. And now you might be thinking “hey neat, a function that gets called before
main!” It is neat, but inside the callback, your binary is sitting in an uninitialized state
because it’s still loading. Things just Don’t Work because they’re not initialized, like C
runtime functions.
Either way, dealing with this directory is pretty straight-forward: iterate over AddressOfCallBacks
until you hit a null byte, dereference the pointer and call the function.
/* initialize the tls callbacks */
DWORD tls_rva = valloc_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].VirtualAddress;
if (tls_rva != 0) {
PIMAGE_TLS_DIRECTORY64 tls_dir = (PIMAGE_TLS_DIRECTORY64)&valloc_buffer[tls_rva];
void (**callbacks)(PVOID, DWORD, PVOID) = (void (**)(PVOID, DWORD, PVOID))tls_dir->AddressOfCallBacks;
while (*callbacks != NULL) {
(*callbacks)(valloc_buffer, DLL_PROCESS_ATTACH, NULL);
++callbacks;
}
}
The reasons you can pass to it on load can be found in the documentation for DllMain.
Detonating the Payload
Now that we have everything settled, we can call the entrypoint of our binary! Unfortunately for us since we’re loading reflectively, we’ll be a little
off from what we expect. For example, if your target binary is an executable and not a DLL, your entrypoint isn’t necessarily the same as main
or even
WinMain
. It is, however, DllMain
for DLLs. So all you have to do is check if IMAGE_FILE_DLL
is set in FileHeader.Characteristics
then call the
appropriate entrypoint.
/* call the entrypoint */
if ((valloc_headers->FileHeader.Characteristics & IMAGE_FILE_DLL) != 0) {
BOOL (WINAPI *dll_main)(HINSTANCE, DWORD, LPVOID) = (BOOL (*)(HINSTANCE, DWORD, LPVOID))&valloc_buffer[valloc_headers->OptionalHeader.AddressOfEntryPoint];
dll_main((HINSTANCE)valloc_buffer, DLL_PROCESS_ATTACH, NULL);
}
else {
int (*main)(PVOID) = (int (*)(PVOID))&valloc_buffer[valloc_headers->OptionalHeader.AddressOfEntryPoint];
main(valloc_buffer);
}
And that’s it! You’ve successfully bitten the forbidden fruit and loaded an executable in an unsanctioned way! May you use this wisdom however you see fit.
Corrections
In a previous article, I incorrectly said one of the import types is a forwarder string. It is not– this is actually part of the export directory, not the import directory. Do you see what Microsoft vaporizing their documentation does? Half the reason I’m making an effort to document things.