Forbidden Content: Loading PE Exports by Hash
A Brief Comment
I got mad again. I was attempting to research PE headers for another project and it happened again. The actual data structure I need is erased from Microsoft documentation. So I’m back to teach some apparently forbidden knowledge as an excuse to document the data structure they’re attempting to wipe off the Internet.
Data Directories
In our last episode we covered how one loads a PE file into memory, preparing it for execution. This is incredibly useful for analysis, because it provides the ability to execute individual pieces of the code, such as decryption routines and other choice code you wish to execute. The technology to perform such a feat is couched behind compiler wizards who refuse to share their black magic with the world. This article is another attempt to spit in the face of those gatekeeping wizards. If you don’t understand the PE format, feel free to read that article first!
With that said, I’d first like to document that which I forgot to: the data directory constants declaring
the locations of all these data directories. The offsets are defined in their
huge unwieldy coverage of the PE format, sure, but not
the actual values you can use from <windows.h>
, so here they are:
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0 // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1 // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2 // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY 4 // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5 // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG 6 // Debug Directory
// IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7 // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS 9 // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT 12 // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 // COM Runtime descriptor
And because I forgot to document this data structure in the last post, here’s what the data structure looks like:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
In some configurations, the data directory array isn’t even present. This is what OptionalHeader.NumberOfRvaAndSizes
and FileHeader.SizeOfOptionalHeader
ultimately controls. The constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES
that defines
the data directory array size is set to its maximum value, 16.
So with that annoyance also documented, let’s cover the export directory. Follow along with the following GitHub repo.
The Export Directory
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
DWORD AddressOfFunctions; // RVA from base of image
DWORD AddressOfNames; // RVA from base of image
DWORD AddressOfNameOrdinals; // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;
The Characteristics
field is a reserved field and thus never manifested into anything. It must always be 0, so
don’t worry about it. TimeDateStamp
refers to when this DLL and its exports were created. MajorVersion
and
MinorVersion
are merely a way to clarify which version of a DLL you’re contending with. Name
is an RVA to an
ASCII null-terminated string. Base
refers to the base ordinal value for the various exported functions, which is
essentially another word for index. It’s usually set to 1.
NumberOfFunctions
refers to the number of exported functions, and NumberOfNames
refers to the number of names.
The names array is organized alphabetically, the functions array is organized however the fuck Microsoft wants to
I guess.
Let’s break down the three arrays that are working together to map our exports:
AddressOfFunctions
: the functions array RVA, containing RVAs which point to either functions or forwarder stringsAddressOfNames
: the names array RVA, containing RVAs pointing to the ASCII names of functionsAddressOfNameOrdinals
: the name ordinal array RVA, containing index values for the names array
How do we get the desired function by function name with this configuration? It’s more simple than the convoluted array configuration implies, but the lack of easily accessible documentation will bite you in the ass. See, in a previous version of this article, I misspoke: the main array to iterate over isn’t the functions array, but rather the names array. From there, we take the index in the names array we’re in, then using the name ordinal array, select the function this name represents.
Let’s define our data structures to make this make more sense:
uint8_t *get_import_by_hash(uint8_t *module, uint32_t hash) {
PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)module;
PIMAGE_NT_HEADERS64 nt_headers = (PIMAGE_NT_HEADERS64)&module[dos_header->e_lfanew];
PIMAGE_DATA_DIRECTORY export_datadir = &nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
PIMAGE_EXPORT_DIRECTORY export_dir = (PIMAGE_EXPORT_DIRECTORY)&module[export_datadir->VirtualAddress];
uint32_t *functions = (uint32_t *)&module[export_dir->AddressOfFunctions];
uint32_t *names = (uint32_t *)&module[export_dir->AddressOfNames];
uint16_t *name_ordinals = (uint16_t *)&module[export_dir->AddressOfNameOrdinals];
Here we’ve collected the export directory RVA and resolved it into an IMAGE_EXPORT_DIRECTORY
structure, then created
objects for the export arrays. Let’s start with the first value in the names table, names[0]
. From there, we select
the ordinal value of the same index, name_ordinals[0]
and apply it to our functions array, functions[name_ordinals[0]]
.
This gives us the associated function value of the given named export.
Now we have to deal with the function value, because it doesn’t always point directly to our function. If the RVA yielded by
the AddressOfFunctions
array lands in the data directory boundary defined by the optional header, it’s a forwarder string,
not a function. The meat of our function resolver looks like this:
uint32_t func_rva = functions[name_ordinals[i]];
if (func_rva >= export_datadir->VirtualAddress && func_rva < export_datadir->VirtualAddress+export_datadir->Size) {
const char *forwarder = (const char *)&module[func_rva];
char *forwarder_mut = malloc(strlen(forwarder)+1);
memcpy(forwarder_mut, forwarder, strlen(forwarder)+1);
char *func;
for (size_t j=0; j<strlen(forwarder); ++j) {
if (forwarder_mut[j] != '.')
continue;
forwarder_mut[j] = 0;
func = &forwarder_mut[j+1];
break;
}
HMODULE forward_dll = LoadLibraryA(forwarder_mut);
uint8_t *proc = (uint8_t *)GetProcAddress(forward_dll, func);
free(forwarder_mut);
return proc;
}
return &module[func_rva];
You can see we’re parsing some sort of string when we land in our magic boundary. It takes the following form: DLLNAME.FunctionName
. You simply
split on the dot to get your target DLL and your function name. If your function RVA does not land on this magic boundary? It’s the target function
you can jump to!
The program in the repo proceeds to do something simple that you might wind up seeing with this style of coding: reflectively allocating a section of code without making it clear what functions they’re importing. Enjoy seeing how this binary looks under a microscope!