Writing a PE packer – Part 2 : imports and relocations

This is the part 2 of our tutorial to write a PE packer on Windows : handling imports and relocations, to execute an ASLR enabled file.

Adding Imports support

What are imports

At this point, one of the main reasons our loader is not working properly is the absence of imports for the loaded binary. Every binary needs external functions to work, located in te Windows world in .dll files. For example, “calc.exe” is going to use functions to open a window, to displays buttons, etc … Let’s consider an example: “ShellExecuteW” (imported by calc.exe). You can see its documentation on msdn here.
Calc.exe needs this function to work properly, so it needs to know where its code is located. The fact is, with dll, we only load them at runtime, at any place in memory, meaning the compiler has no way to know where it is going to be (and even more with ASLR activated). So, it cannot produce a proper call instruction with the immediate address for “ShellExecuteW”.
That’s why the compiler creates a table (the IAT, as we are going to see), where it expects to find the address of “ShellExecuteW” once it is loaded and will call this address when needed.

In a debugger, we see this:

import address table

The first call is an internal call, in the same module. The compiler knows where the destination is, and used the E8 opcode, meaning “realtive call”. When calling external modules, it calls a value read from the IAT, as shown with “ShellExecuteW”.

Function can be imported either by their name (an ascii C string), or by their number in the export table of the DLL (called “ordinal”). We’ll see that when loading.

Imports in the PE header

Now let’s see how it is described in the PE header. It is a bit complex, so bear with me, it is actually much simpler to code than to describe!

First, we are going to take a look at the “Data Directories”. They are part of the “Optional Header”. It is a simple array of 15 structures, with RVA and size. You can see them in CFF:

data directories

The order of the directories is predetermined. The first one is always the “Export Directory”, the second one the “Import Directory”, and so on.
You can see 2 directories related to imports: the “Import Directory”, pointing to the “Import Directory Table” (IDT), and the “Import Address Table Directory”, pointing to the “Import Address Table” (IAT).
Basically, the IDT will tell us what functions need to be imported. We are going to import them and place their address in the IAT. The IDT is the “what”, the IAT is the “where”.

CFF can show you the content of the IDT in the “Import Directory” menu on the left:

import directory table

First you see what DLL are imported, and if you click on it, you see the actual functions imported.

It is actually described like that in the header: the “Import Directory” RVA found in the “Data Directories” points to a NULL termianted array of IMAGE_IMPORT_DESCRIPTOR :

typedef struct _IMAGE_IMPORT_DESCRIPTOR
{ _ANONYMOUS_UNION union
  { DWORD         Characteristics;
    DWORD         OriginalFirstThunk;
  }         DUMMYUNIONNAME;
  DWORD         TimeDateStamp;
  DWORD         ForwarderChain;
  DWORD         Name;
  DWORD         FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;

Both OriginalFirstThunk and FirstThunk points to an array of DWORD, NULL terminated as well.
OriginalFirstThunk is the IDT, each DWORD in its array is either:

  • A RVA to an IMAGE_IMPORT_BY_NAME struct, which in turn point to an ascii string: the function name.
  • Or if the first bit is 1, this DWORD is the ordinal of the function to import.

The second one is the IAT, it has the exact same structure, and when we get the function address, we place is at the mirrored position between IDT and IAT.

Programming the imports

We know have everything we need to program the imports. This need to be done after we put the sections in memory (so we can look for RVAd irectly in the memory we allocated, relative to the ImageBase we chose for our module), but before we modify the permissions (and may remove writing privilege).

In the code below, we read the functions names (or ordinals) from the array pointed to by OriginalFirstThunk (the variable lookup_table, this is the IDT), and we store the result in the array pointed to by FirstThunk (the variable address_table, this is the IAT).

IMAGE_DATA_DIRECTORY* data_directory = p_NT_HDR->OptionalHeader.DataDirectory;

// load the address of the import descriptors array
IMAGE_IMPORT_DESCRIPTOR* import_descriptors = (IMAGE_IMPORT_DESCRIPTOR*) (ImageBase + data_directory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);

// this array is null terminated
for(int i=0; import_descriptors[i].OriginalFirstThunk != 0; ++i) {

    // Get the name of the dll, and import it
    char* module_name = ImageBase + import_descriptors[i].Name;
    HMODULE import_module = LoadLibraryA(module_name);
    if(import_module == NULL) {
        return NULL;
    }

    // the lookup table points to function names or ordinals => it is the IDT
    IMAGE_THUNK_DATA* lookup_table = (IMAGE_THUNK_DATA*) (ImageBase + import_descriptors[i].OriginalFirstThunk);

    // the address table is a copy of the lookup table at first
    // but we put the addresses of the loaded function inside => that's the IAT
    IMAGE_THUNK_DATA* address_table = (IMAGE_THUNK_DATA*) (ImageBase + import_descriptors[i].FirstThunk);

    // null terminated array, again
    for(int i=0; lookup_table[i].u1.AddressOfData != 0; ++i) {
        void* function_handle = NULL;

        // Check the lookup table for the adresse of the function name to import
        DWORD lookup_addr = lookup_table[i].u1.AddressOfData;

        if((lookup_addr & IMAGE_ORDINAL_FLAG) == 0) { //if first bit is not 1
            // import by name : get the IMAGE_IMPORT_BY_NAME struct
            IMAGE_IMPORT_BY_NAME* image_import = (IMAGE_IMPORT_BY_NAME*) (ImageBase + lookup_addr);
            // this struct points to the ASCII function name
            char* funct_name = (char*) &(image_import->Name);
            // get that function address from it's module and name
            function_handle = (void*) GetProcAddress(import_module, funct_name);
        } else {
            // import by ordinal, directly
            function_handle = (void*) GetProcAddress(import_module, (LPSTR) lookup_addr);
        }

        if(function_handle == NULL) {
            return NULL;
        }

        // change the IAT, and put the function address inside.
        address_table[i].u1.Function = (DWORD) function_handle;
    }
}

A side note on IMAGE_THUNK_DATA : this is a simple union, equivalent to a DWORD. I prefered to use it because:

  • It is the actual type for the field, to respect the different usages it has.
  • As such, it shows more clearly that at the begining, we take a RVA, and at the end, we store a function pointer. But both are still a DWORD.

We are almost done, but we still need to talk about relocations.

Managing relocations

What are relocations

Let’s think about what we did for now:

  1. We opened calc.exe file, and read it’s headers.
  2. calc.exe has an “Image Base” in its header, a Virtual Address where it expects to be placed.
  3. It has ASLR activated (Flag “Dll can move” in IMAGE_NT_HEADER.OptionalHeader.DllCharacteristics), so basically, we should be able to place it anywhere.
  4. We allocated memory with VirtualAlloc, with a NULL first parameter, letting the OS choose an address that suits it.
  5. This random address given by the OS is now calc.exe actual ImageBase
  6. We imported calc.exe functions and placed their addresses in the IAT.

And now, at some point, calc.exe is going to call those imported function, by doing this call we saw earlier:

call to the IAT

Take a good look at the opcode: FF15, followed by 0x004b3038 in little endian. That second number is an obslute virtual address, of the “ShellExecuteW” address in the IAT. And that, is a big, huge issue for a PE that is expected to be moved.
Let’s say we place our calc.exe PE at VA_0x00500000 instead of VA_0x00400000 like it is “supposed” to. This instruction is still the same. So, it is still going to look at the addresse VA_0x004b3038 which is not even part of calc.exe memory! There could be anything, or most probably nothing there.
What we are seeing here, is that when mooving a PE, the assembly code needs to be patched at runtime, to account for this change, and that is what relocations are all about.

Relocations structure

This structure is simpler that the imports one, but it is not as essential to know its exact inner working. When unpacking a malware for example, it is common to have to rebuild or patch an import table. But relocations are never an issue: you just disable the ASLR (uncheck “Dll can move”), and relocations are not needed, because no “relocating” takes place.

So, the “Data Directory” has an entry for the relocation table (an RVA, like the others). This relocation table is made of blocks following one another directly. Each block is composed of a header, and an array of word. Here is the header:

typedef struct _IMAGE_BASE_RELOCATION
{ DWORD     VirtualAddress;
  DWORD     SizeOfBlock;
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
  • VirtualAddress : it is a Relative Virtual Address, where we begin the relocations for this block. This is actually a page address, as the relocation offset is limited to 12 bits (so 0x1000, 4kb, the Windows page size).
  • SizeOfBlock : the size of the block, in bytes, header included.

After this header are a list of word (2 bytes !), ending with the block (so we need to use SizeOfBlock to know when to stop). Each word is as follow:

  • 4 bits for the type of the relocation (only one is really used to make changes)
  • 12 bits for the offset, relative to the block’s VirtualAddress

This basically gives us an array of Virtual Addresses to patch. The patching consist of shifting the DWORD pointed to by as much as we shifted the “Image Base” of the module.

Programming the relocations

Again, this need to be between the sections mapping and changing their permissions, especially since the code section, .text, is going to be read and execute only, and the whole point of the relocations is to write inside.

Here is the code moving through the relocatiosn blocks, and patching the memory:

//this is how much we shifted the ImageBase
DWORD delta_VA_reloc = ((DWORD) ImageBase) - p_NT_HDR->OptionalHeader.ImageBase;

// if there is a relocation table, and we actually shitfted the ImageBase
if(data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress != 0 && delta_VA_reloc != 0) {

    //calculate the relocation table address
    IMAGE_BASE_RELOCATION* p_reloc = (IMAGE_BASE_RELOCATION*) (ImageBase + data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);

    //once again, a null terminated array
    while(p_reloc->VirtualAddress != 0) {

        // how any relocation in this block
        // ie the total size, minus the size of the "header", divided by 2 (those are words, so 2 bytes for each)
        DWORD size = (p_reloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION))/2;
        // the first relocation element in the block, right after the header (using pointer arithmetic again)
        WORD* reloc = (WORD*) (p_reloc + 1);
        for(int i=0; i<size; ++i) {
            //type is the first 4 bits of the relocation word
            int type = reloc[i] >> 12;
            // offset is the last 12 bits
            int offset = reloc[i] & 0x0fff;
            //this is the address we are going to change
            DWORD* change_addr = (DWORD*) (ImageBase + p_reloc->VirtualAddress + offset);

            // there is only one type used that needs to make a change
            switch(type){
                case IMAGE_REL_BASED_HIGHLOW :
                    *change_addr += delta_VA_reloc;
                    break;
                default:
                    break;
            }
        }

        // switch to the next relocation block, based on the size
        p_reloc = (IMAGE_BASE_RELOCATION*) (((DWORD) p_reloc) + p_reloc->SizeOfBlock);
    }
}

And now, that should be it!

Final result

The final code can be found here:
https://github.com/jeremybeaume/packer-tutorial/blob/master/part2/main.c

You should now be able to compile your loader, and run it to load any 32 bits ASLR enabled .exe files, for example: loader.exe C:\Windows\SysWOW64\calc.exe .
This may not be impressive at all, you could have used the system or CreateProcess functions, but those 2 actually create another process, as the name suggest. We didn’t, we executed calc.exe from within the memory of our loader, which is exactely what we are going to do after we unpack a binary. Instead of getting the PE data from the file system, our packer will read it from its own memory. That’s the point of the next step in this tutorial: Part 3 : packing with python.

Leave a Reply

Your email address will not be published. Required fields are marked *