Writing a PE packer – Part 4 : packing with no relocation

At the end of the last part, I drawed your attention toward the fact that Mingw32 doesn’t produce movable binaries: it cannot create relocation table. You can force it to put the “Dll can move” flag, but without a relocation table, this binary would not work. We are going to change our packer to handle such non movable binaries.

A problem, and its solution

To handle a non-movable binary, we are going to place ourselve (the unpacker) at its image base, and pre allocate in our section its memory. Right now our packed binary in memory looks like this:

VA RVA Content
0x00400000 0 unpacker PE header
0x00401000 0x1000 unpacker .text section
0x00402000 0x2000 unpacker .rdata section
0x00403000 0x3000 unpacker .eh_fram section
0x00404000 0x4000 unpacker .idata section
0x00405000 0x5000 unpacker .packed section

But if we want to pack a binary expecting to be placed at the VA 0x00400000, like we are, we could not load it: we are already at this place, we would be writing over our own code. We could try to place ourselves somewhere else and hope to be able to allocate the packed binary ImageBase with VirtualAlloc but there is no guarantee it would work: the OS could have placed something already there, like Kernel32.dll.
So, to make sure everything runs smoothly, we are going to place ourselves at the packed binary image base, on purpose, but we’ll let room in memory for loading it. We’ll get something like this:

VA RVA Content
image base of the packed binary 0 unpacker PE header
0x1000 .alloc section, for the packed binary loading
0x1000 + size of the packed binary in memory sections of the unpacker
sections of the unpacker
0x5000 + size of the packed binary in memory .packed section, with the packed PE file

We would be placing the packed binary at its expected Image Base, the same as ours. We can load its sections in memory, because we already allocated space for them in the .alloc section. We can replace our own PE header by changing the memory page permissions, that works fine (some malware actually does this). Basically, we have everything we need, let’s program it.

Modifying the unpacking stub

There is little to do here. The first thing is to check the packed PE header for ASLR. If we cannot move, we don’t allocate memory, it’s going to be done by the packer in the .alloc section. We will be using the current module address as the image base:

char* ImageBase = NULL;
if(p_NT_HDR->OptionalHeader.DllCharacteristics & IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE) {
    ImageBase = (char*) VirtualAlloc(NULL, p_NT_HDR->OptionalHeader.SizeOfImage, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
    if(ImageBase == NULL) {
        // Allocation failed
        return NULL;
    }
} else {
    //if no ASLR : the packer would have placed us at the expected image base already
    ImageBase = (char*) GetModuleHandleA(NULL);
}

When loading the sections, we should do some VirtualProtect : one to make the PE header writable (necessary), and others (optionnal) to make sur we can write the packed binary sections data in the .alloc section:

DWORD oldProtect;
//The PE header is readonly, we have to make it writable to be able to change it
VirtualProtect(ImageBase, p_NT_HDR->OptionalHeader.SizeOfHeaders, PAGE_READWRITE, &oldProtect);
mymemcpy(ImageBase, PE_data, p_NT_HDR->OptionalHeader.SizeOfHeaders);

// Section headers starts right after the IMAGE_NT_HEADERS struct, so we do some pointer arithmetic-fu here.
IMAGE_SECTION_HEADER* sections = (IMAGE_SECTION_HEADER*) (p_NT_HDR + 1); 

// For each sections
for(int i=0; i<p_NT_HDR->FileHeader.NumberOfSections; ++i) {
    // calculate the VA we need to copy the content, from the RVA 
    // section[i].VirtualAddress is a RVA, mind it
    char* dest = ImageBase + sections[i].VirtualAddress; 

    // check if there is Raw data to copy
    if(sections[i].SizeOfRawData > 0) {
        // A VirtualProtect to be sure we can write in the allocated section
        VirtualProtect(dest, sections[i].SizeOfRawData, PAGE_READWRITE, &oldProtect);
        // We copy SizeOfRaw data bytes, from the offset PointertoRawData in the file
        mymemcpy(dest, PE_data + sections[i].PointerToRawData, sections[i].SizeOfRawData);
    } else {
        // if no raw data to copy, we just put zeroes, based on the VirtualSize
        VirtualProtect(dest, sections[i].Misc.VirtualSize, PAGE_READWRITE, &oldProtect);
        mymemset(dest, 0, sections[i].Misc.VirtualSize);
    }
}

And that’s actually everything that we need to change in the unpacking stub. The biggest changes will be in the python packer.

Modifying the python packer

We’re going to need to compile the unpacking stub with options depending on the packed binary. So, let’s just start by writing this simple function that’s going to do this automatically for us :

def compile_stub(input_cfile, output_exe_file, more_parameters = []):
    cmd = (["mingw32-gcc.exe", input_cfile, "-o", output_exe_file] # Force the ImageBase of the destination PE
        + more_parameters +
        ["-Wl,--entry=__start", # define the entry point
        "-nostartfiles", "-nostdlib", # no standard lib
        "-lkernel32" # Add necessary imports
        ])
    print("[+] Compiling stub : "+" ".join(cmd))
    subprocess.run(cmd)

The begining is the same, we just compile the unpacking stub automatically, and open the input PE with lief :

parser = argparse.ArgumentParser(description='Pack PE binary')
parser.add_argument('input', metavar="FILE", help='input file')
parser.add_argument('-o', metavar="FILE", help='output', default="packed.exe")

args = parser.parse_args()

# Opens the input PE
input_PE = lief.PE.parse(args.input)

# Compiles the unpacker stub a first time, with no particular options
compile_stub("unpack.c", "unpack.exe", more_parameters=[]);

# open the unpack.exe binary
unpack_PE = lief.PE.parse("unpack.exe")

# we're going to keep the same alignment as the ones in unpack_PE,
# because this is the PE we are modifying
file_alignment = unpack_PE.optional_header.file_alignment
section_alignment = unpack_PE.optional_header.section_alignment

The we need to check for ASLR in the input file:

ASLR = (input_PE.optional_header.dll_characteristics & lief.PE.DLL_CHARACTERISTICS.DYNAMIC_BASE != 0)
if ASLR:
    output_PE = unpack_PE # we can use the current state of unpack_PE as our output
else:

Now, in the else case, we need to add the .alloc section when ASLR is disabled. We are going to start by checking the memory space used by the sections of input_PE:

# The RVA of the lowset section of input PE
min_RVA = min([x.virtual_address for x in input_PE.sections])
# The RVA of the end of the highest section
max_RVA = max([x.virtual_address + x.size for x in input_PE.sections])

We could simply have used the SizeOfImage as we did before (in the VirtualAlloc), but it includes the memory used by the PE header, which is already allocated (and occupied by the unpacker PE header). We just need memory for the sections in .alloc, and that’s what we computed here.
We can now create the .alloc section that will cover all this space :

alloc_section = lief.PE.Section(".alloc")
alloc_section.virtual_address = min_RVA
alloc_section.virtual_size = align(max_RVA - min_RVA, section_alignment)
alloc_section.characteristics = (lief.PE.SECTION_CHARACTERISTICS.MEM_READ
                                | lief.PE.SECTION_CHARACTERISTICS.MEM_WRITE
                                | lief.PE.SECTION_CHARACTERISTICS.CNT_UNINITIALIZED_DATA)

The .alloc section has no data : it’s just memory. Its raw size will be null.

We now need to make room in the unpacker for this section. We cannot just change the sections RVA, there are many dependencies to the RVA (the import tables for example, contains A LOT of RVA as we saw). Shifting all the sections in a PE is no trivial thing, but we can simply ask the compiler nicely. We also need it to place the unpacker at the same image base as the packed binary.

First, some math:

# to put the section just after ours, find the lowest section RVA in the stub
min_unpack_RVA = min([x.virtual_address for x in unpack_PE.sections])
# and compute how much we need to move to be exactly after the .alloc section
shift_RVA = (min_RVA + alloc_section.virtual_size) - min_unpack_RVA

We compute the minimal section RVA used by the unpacker (should be its section alignment, 0x1000 usually). And we compute by how much we need to move the sections to make the lowest one match the end of the .alloc section. Now, we’ll be asking the compiler to put the unpacker at the packed binary image base, and to shift all the sections RVA by shift_RVA :

# We need to recompile the stub to make room for the `.alloc` section, by shifting all its sections
compile_parameters = [f"-Wl,--image-base={hex(input_PE.optional_header.imagebase)}"]

for s in unpack_PE.sections:
    compile_parameters += [f"-Wl,--section-start={s.name}={hex(input_PE.optional_header.imagebase + s.virtual_address + shift_RVA )}"]

# recompile the stub with the shifted sections
compile_stub("unpack.c", "shifted_unpack.exe", compile_parameters)

Note that the --section-start option expects VA in hex, not RVA.

Now if all worked fine, we should have in shifted_unpack.exe the unpacker with the same image base as the packed binary, and a space in the sections memory to fit our .alloc one :

As you can see, we got a big space before the .text section, and that’s where we’re going to put the .alloc section. Now lief doesn’t let us add a section at the beginning easily, and it appears the Windows loader expects the sections to be ordered by increasing RVA. So we’re just going to make a new PE from scratch, and copy everything we need inside:

unpack_shifted_PE = lief.PE.parse("shifted_unpack.exe")

# This would insert .alloc section at the end of the table, so the RVA would not be in order.
# but Windows doesn' t seem to like it : the binary doesn' t load.
# output_PE = unpack_shifted_PE
# output_PE.add_section(alloc_section)

# Here is how we make a completely new PE, copying the important properties
# And adding the sections in order
output_PE = lief.PE.Binary("pe_from_scratch", lief.PE.PE_TYPE.PE32)

# Copy optional headers important fields
output_PE.optional_header.imagebase = unpack_shifted_PE.optional_header.imagebase
output_PE.optional_header.addressof_entrypoint = unpack_shifted_PE.optional_header.addressof_entrypoint
output_PE.optional_header.section_alignment = unpack_shifted_PE.optional_header.section_alignment
output_PE.optional_header.file_alignment = unpack_shifted_PE.optional_header.file_alignment
output_PE.optional_header.sizeof_image = unpack_shifted_PE.optional_header.sizeof_image

# make sure output_PE cannot move
output_PE.optional_header.dll_characteristics = 0

# copy the data directories (imports most notably)
for i in range(0, 15):
    output_PE.data_directories[i].rva = unpack_shifted_PE.data_directories[i].rva
    output_PE.data_directories[i].size = unpack_shifted_PE.data_directories[i].size

# add the sections in order
output_PE.add_section(alloc_section)
for s in unpack_shifted_PE.sections:
    output_PE.add_section(s)

We need to make sure our packed PE doesn’t move, that’s all the point of placing ourselves at the same image base as the packed binary (that cannot be moved). That’s what the dll_characteristics = 0 is for.

You just need to modify the rest of the file to add the .packed section to output_PE and we’re done! Here is a packed “hello world” sections:

pacekd hello world sections

Final words

The final code can be found as usual here: https://github.com/jeremybeaume/packer-tutorial/tree/master/part4

Our packer is now able to handle any PE 32 binary .exe files. You could for example pack a binary already packed, it works.
It does not yet work fully on DLLs (we’re missing a few things in the loader), and it also won’t work at all on .net executable files (they are also .exe files, but doesn’t contains X86 ASM instructions).

This packer is still pretty useless, but we’re going to remedy that in the next tutorial part : Part 5 : simple obfuscation.

1 thought on “Writing a PE packer – Part 4 : packing with no relocation

  1. Hi Jeremy,
    I ran the code of part4 from your github, but the output didn’t start. I checked it with IDA Pro and I saw that the address of external functions (like GetModuleHandleA) were failed. I don’t know why, have you encountered this problem? And how can I fix it?

    Sorry for my bad english.

Leave a Reply

Your email address will not be published. Required fields are marked *