A Brief Comment

I got mad again. I was attempting to research PE headers for another project and it happened again. The actual data structure I need is erased from Microsoft documentation. So I’m back to teach some apparently forbidden knowledge as an excuse to document the data structure they’re attempting to wipe off the Internet.

Data Directories

In our last episode we covered how one loads a PE file into memory, preparing it for execution. This is incredibly useful for analysis, because it provides the ability to execute individual pieces of the code, such as decryption routines and other choice code you wish to execute. The technology to perform such a feat is couched behind compiler wizards who refuse to share their black magic with the world. This article is another attempt to spit in the face of those gatekeeping wizards. If you don’t understand the PE format, feel free to read that article first!

With that said, I’d first like to document that which I forgot to: the data directory constants declaring the locations of all these data directories. The offsets are defined in their huge unwieldy coverage of the PE format, sure, but not the actual values you can use from <windows.h>, so here they are:

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

And because I forgot to document this data structure in the last post, here’s what the data structure looks like:

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

In some configurations, the data directory array isn’t even present. This is what OptionalHeader.NumberOfRvaAndSizes and FileHeader.SizeOfOptionalHeader ultimately controls. The constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES that defines the data directory array size is set to its maximum value, 16.

So with that annoyance also documented, let’s cover the export directory. Follow along with the following GitHub repo.

The Export Directory

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;     // RVA from base of image
    DWORD   AddressOfNames;         // RVA from base of image
    DWORD   AddressOfNameOrdinals;  // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The Characteristics field is a reserved field and thus never manifested into anything. It must always be 0, so don’t worry about it. TimeDateStamp refers to when this DLL and its exports were created. MajorVersion and MinorVersion are merely a way to clarify which version of a DLL you’re contending with. Name is an RVA to an ASCII null-terminated string. Base refers to the base ordinal value for the various exported functions, which is essentially another word for index. It’s usually set to 1.

NumberOfFunctions refers to the number of exported functions, and NumberOfNames refers to the number of names. The names array is organized alphabetically, the functions array is organized however the fuck Microsoft wants to I guess.

Let’s break down the three arrays that are working together to map our exports:

  • AddressOfFunctions: the functions array RVA, containing RVAs which point to either functions or forwarder strings
  • AddressOfNames: the names array RVA, containing RVAs pointing to the ASCII names of functions
  • AddressOfNameOrdinals: the name ordinal array RVA, containing index values for the names array

How do we get the desired function by function name with this configuration? It’s more simple than the convoluted array configuration implies, but the lack of easily accessible documentation will bite you in the ass. See, in a previous version of this article, I misspoke: the main array to iterate over isn’t the functions array, but rather the names array. From there, we take the index in the names array we’re in, then using the name ordinal array, select the function this name represents.

Let’s define our data structures to make this make more sense:

uint8_t *get_import_by_hash(uint8_t *module, uint32_t hash) {
   PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)module;
   PIMAGE_NT_HEADERS64 nt_headers = (PIMAGE_NT_HEADERS64)&module[dos_header->e_lfanew];
   PIMAGE_DATA_DIRECTORY export_datadir = &nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
   PIMAGE_EXPORT_DIRECTORY export_dir = (PIMAGE_EXPORT_DIRECTORY)&module[export_datadir->VirtualAddress];
   uint32_t *functions = (uint32_t *)&module[export_dir->AddressOfFunctions];
   uint32_t *names = (uint32_t *)&module[export_dir->AddressOfNames];
   uint16_t *name_ordinals = (uint16_t *)&module[export_dir->AddressOfNameOrdinals];

Here we’ve collected the export directory RVA and resolved it into an IMAGE_EXPORT_DIRECTORY structure, then created objects for the export arrays. Let’s start with the first value in the names table, names[0]. From there, we select the ordinal value of the same index, name_ordinals[0] and apply it to our functions array, functions[name_ordinals[0]]. This gives us the associated function value of the given named export.

Now we have to deal with the function value, because it doesn’t always point directly to our function. If the RVA yielded by the AddressOfFunctions array lands in the data directory boundary defined by the optional header, it’s a forwarder string, not a function. The meat of our function resolver looks like this:

      uint32_t func_rva = functions[name_ordinals[i]];
      
      if (func_rva >= export_datadir->VirtualAddress && func_rva < export_datadir->VirtualAddress+export_datadir->Size) {
         const char *forwarder = (const char *)&module[func_rva];
         char *forwarder_mut = malloc(strlen(forwarder)+1);
         memcpy(forwarder_mut, forwarder, strlen(forwarder)+1);
         char *func;

         for (size_t j=0; j<strlen(forwarder); ++j) {
            if (forwarder_mut[j] != '.')
               continue;

            forwarder_mut[j] = 0;
            func = &forwarder_mut[j+1];
            break;
         }

         HMODULE forward_dll = LoadLibraryA(forwarder_mut);
         uint8_t *proc = (uint8_t *)GetProcAddress(forward_dll, func);
         free(forwarder_mut);

         return proc;
      }
         
      return &module[func_rva];

You can see we’re parsing some sort of string when we land in our magic boundary. It takes the following form: DLLNAME.FunctionName. You simply split on the dot to get your target DLL and your function name. If your function RVA does not land on this magic boundary? It’s the target function you can jump to!

The program in the repo proceeds to do something simple that you might wind up seeing with this style of coding: reflectively allocating a section of code without making it clear what functions they’re importing. Enjoy seeing how this binary looks under a microscope!