Adding code to an existing ELF file

Recently I was reverse-engineering an Android app^[a]. The relevant details are as follows:

Important stuff is handled in JNI shared objects
Java component is UI/platform glue code
The source language for the shared objects is C++

My goal was to dump a buffer I found that contained some data, and I was able to write a Python/GDB script to do just that. This worked fine on an (x86_64) emulator, but I faced a problem when I finally got ahold of an Android device: Python/GDB is very slow on hardware, especially when you're writing to disk at each breakpoint invocation.

So what's a gal to do with a laggy debugger and an app that crashes if it gets too far behind?

Translate her Python script to AArch64 assembly and patch it into the app, of course!

Because I had reverse-engineered the x86_64 binary and not the AArch64 one, I had to find the right registers to pull my buffer data from again. I also had to learn how to read (and write) ARMv8-A assembly. Thankfully, ARM is both a RISC and load-store architecture, so it was fairly easy to pick up on. I had a patch written fairly quickly^[b].

Applying it was another matter. My first attempt went as follows:

Assemble my patch and dump the .text (code) section
Use objcopy to append the patch to the end of the .text section
Modify an instruction at the right address to jump to my code
(Repackage APK, etc.)

This prevented the app from being able to load the ELF file, because it would attempt to access a string in .rodata and would instead pull a different string.

As it turns out, you can't just stick code to the end of the .text section, because relative addressing to later sections would be broken (and in this and most ELF files, .data, .rodata, .bss, etc. are all stored after .text.) In order to get this to work, I would have to find and modify every single relative address in the binary. Alternatively, I could try to somehow add my code after those sections. I decided on the latter, for what I'm sure are obvious reasons.

Time for a tour of the ELF format! (For simplicity, I'll be focusing on 64-bit ELF files.)

An ELF file has "segments" and "sections". The program header table contains segments, which hold runtime info and map out sections into memory segments. The section header table contains section descriptions, which map out the file contents. In order to add in our patch, we'd have to add it to the file, and map it to a section. Then, we'd have to map that section to a segment.

Adding a section is easy, because the section header table is stored at the end of this file (by no means is this a requirement). The contents of this new section should just be the .text section of the compiled patch. However, we do need to make sure to set some flags so it's executable:

as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy file.so --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out

Now we need to map this new .patch section to a segment. Sections are mapped to segments by checking what sections are in the segment's chunk of the file. Unfortunately, the program header is stored immediately after the ELF file header itself. This means we'll have to commandeer an existing segment header.

Here is the program header of the ELF file I'm working with:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000589c18 0x0000000000589c18  R E    0x1000
  LOAD           0x000000000058a530 0x000000000058b530 0x000000000058b530
                 0x000000000003a418 0x000000000003c268  RW     0x1000
  DYNAMIC        0x00000000005ba368 0x00000000005bb368 0x00000000005bb368
                 0x0000000000000320 0x0000000000000320  RW     0x8
  NOTE           0x0000000000000200 0x0000000000000200 0x0000000000000200
                 0x0000000000000024 0x0000000000000024  R      0x4
  NOTE           0x0000000000589b80 0x0000000000589b80 0x0000000000589b80
                 0x0000000000000098 0x0000000000000098  R      0x4
  GNU_EH_FRAME   0x0000000000509244 0x0000000000509244 0x0000000000509244
                 0x000000000000f97c 0x000000000000f97c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000058a530 0x000000000058b530 0x000000000058b530
                 0x0000000000038ad0 0x0000000000038ad0  R      0x1

Each segment has a type, which describes what it stores. These are the ones in the file:

LOAD segments should be loaded/mapped into memory. All other segments are not^[c].
DYNAMIC segments contain dynamic linking information.
NOTE segments contain, you guessed it, notes about the file. The GNU linker uses certain .note sections to get information about the file.
GNU_EH_FRAME contains exception unwinding information.
GNU_STACK tells the kernel how to handle the stack (e.g. if it needs to be executable).
GNU_RELRO marks sections that should be made read-only after being loaded.

A NOTE segment is the best (read: only) choice we can make here, but how do we choose which one?

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame .gcc_except_table .note.android.ident
   01     .init_array .fini_array .data.rel.ro .dynamic .got .data cfstring .bss
   02     .dynamic
   03     .note.gnu.build-id
   04     .note.android.ident
   05     .eh_frame_hdr
   06
   07     .init_array .fini_array .data.rel.ro .dynamic .got

.note.gnu.build-id seems the safest bet here, as that's only used to provide a unique identifier for the binary. So my new process was:

Assemble my patch and dump the .text section
Create a new section, .patch, in the target ELF file with the dumped .text section
Modify the first NOTE segment to be a LOAD segment and point it to the .patch section
Modify the .bss section and its containing segment to be 8 bytes larger, so the patch can store its state there.
Modify an instruction at the right address to jump to my code
(Repackage APK, etc.)

Modifying the segment header is the meat of this process. Here is the segment header definition, from elf(5):

// typedef uint64_t Elf64_Off
// typedef uint64_t Elf64_Addr
typedef struct {
    uint32_t   p_type;
    uint32_t   p_flags;
    Elf64_Off  p_offset;
    Elf64_Addr p_vaddr;
    Elf64_Addr p_paddr;
    uint64_t   p_filesz;
    uint64_t   p_memsz;
    uint64_t   p_align;
} Elf64_Phdr;

p_type is the type of segment. This should be set to PT_LOAD, which is 1.
p_flags holds RWX flags for the segment. This should be set to 5 (R+X).
p_offset is the start of the segment in the file. objcopy always adds the new section at 0x5d0000, so that's what we'll set this to.
p_vaddr is where the segment should be mapped to in process memory. Because the second LOAD segment has a p_memsz larger than its p_filesz, we set this to 0x5e0000 to be safe.
p_paddr is used for physical addressing, which is irrelevant here.
p_filesz describes the size of the segment in the file, which in this case is the size of the patch.
p_memsz describes how much memory this segment needs, which is again the size of the patch.
p_align is the alignment this segment needs. It's stored as a power of two. We just copy from the first LOAD segment and set this to 1.

One problem: as far as I can tell, objcopy doesn't support modifying the program table. In order to solve this problem and avoid modifying it by hand, I wrote a small C program capable of doing so.

My final build script looks like this:

#!/usr/bin/bash
as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy file.so --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out
./modelf patch.out                          \
    --segment 3                             \
        --type   1                          \
        --offset 0x5d0000                   \
        --vaddr  0x5e0000                   \
        --paddr  0x5e0000                   \
        --filesz $(stat -c '%s' patch.text) \
        --memsz  $(stat -c '%s' patch.text) \
        --align  1                          \
        --flags  0x5                        \
    --section 26                            \
        --addr 0x5e0000                     \
    --segment 1                             \
        --memsz 0x3c270                     \
    --section 24                            \
        --size 0x1e50
./patch-binary.sh modelf-out.elf patchfile.pf

And there you have it! This method produces a working, patched shared object ready to be loaded by the Android app.

If the ELF file didn't happen to have a spare NOTE segment, I would have needed to do something much uglier. I plan on experimenting with adding new segments in the future.

a. ^ I plan to do a writeup of this project in the future, but until then, I won't give specifics on it (for legal reasons, if nothing else).

b. ^ Of course, I had to fix bugs after successfully patching it in, but the process of writing the patch and fixing the bugs therein are outside the scope of this post and are better suited for the aforementioned planned future writeup.

c. ^ Only sections in LOAD segments are mapped to memory, but other segments can also contain those sections (e.g. .note.gnu.build-id here).

dropbear's blog

Adding code to an existing ELF file