Recently I was reverse-engineering an Android app[a]. The relevant details are as follows:
- Important stuff is handled in JNI shared objects
- Java component is UI/platform glue code
- The source language for the shared objects is C++
My goal was to dump a buffer I found that contained some data, and I was able to write a Python/GDB script to do just that. This worked fine on an (x86_64) emulator, but I faced a problem when I finally got ahold of an Android device: Python/GDB is very slow on hardware, especially when you're writing to disk at each breakpoint invocation.
So what's a gal to do with a laggy debugger and an app that crashes if it gets too far behind?
Translate her Python script to AArch64 assembly and patch it into the app, of course!
Because I had reverse-engineered the x86_64 binary and not the AArch64 one, I had to find the right registers to pull my buffer data from again. I also had to learn how to read (and write) ARMv8-A assembly. Thankfully, ARM is both a RISC and load-store architecture, so it was fairly easy to pick up on. I had a patch written fairly quickly[b].
Applying it was another matter. My first attempt went as follows:
- Assemble my patch and dump the
.text
(code) section - Use
objcopy
to append the patch to the end of the.text
section - Modify an instruction at the right address to jump to my code
- (Repackage APK, etc.)
This prevented the app from being able to load the ELF file, because it would attempt to access a string
in .rodata
and would instead pull a different string.
As it turns out, you can't just stick code to the end of the .text
section,
because relative addressing to later sections would be broken (and in this and most ELF files,
.data
, .rodata
, .bss
, etc. are all stored after
.text
.) In order to get this to work, I would have to find and modify every single
relative address in the binary. Alternatively, I could try to somehow add my code after those
sections. I decided on the latter, for what I'm sure are obvious reasons.
Time for a tour of the ELF format! (For simplicity, I'll be focusing on 64-bit ELF files.)
An ELF file has "segments" and "sections". The program header table contains segments, which hold runtime info and map out sections into memory segments. The section header table contains section descriptions, which map out the file contents. In order to add in our patch, we'd have to add it to the file, and map it to a section. Then, we'd have to map that section to a segment.
Adding a section is easy, because the section header table is stored at the end of this file (by no
means is this a requirement). The contents of this new section should just be the .text
section of the compiled patch. However, we do need to make sure to set some flags so it's executable:
as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy file.so --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out
Now we need to map this new .patch
section to a segment. Sections are mapped to segments by
checking what sections are in the segment's chunk of the file. Unfortunately, the program header is
stored immediately after the ELF file header itself. This means we'll have to commandeer an existing
segment header.
Here is the program header of the ELF file I'm working with:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000589c18 0x0000000000589c18 R E 0x1000
LOAD 0x000000000058a530 0x000000000058b530 0x000000000058b530
0x000000000003a418 0x000000000003c268 RW 0x1000
DYNAMIC 0x00000000005ba368 0x00000000005bb368 0x00000000005bb368
0x0000000000000320 0x0000000000000320 RW 0x8
NOTE 0x0000000000000200 0x0000000000000200 0x0000000000000200
0x0000000000000024 0x0000000000000024 R 0x4
NOTE 0x0000000000589b80 0x0000000000589b80 0x0000000000589b80
0x0000000000000098 0x0000000000000098 R 0x4
GNU_EH_FRAME 0x0000000000509244 0x0000000000509244 0x0000000000509244
0x000000000000f97c 0x000000000000f97c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x000000000058a530 0x000000000058b530 0x000000000058b530
0x0000000000038ad0 0x0000000000038ad0 R 0x1
Each segment has a type, which describes what it stores. These are the ones in the file:
LOAD
segments should be loaded/mapped into memory. All other segments are not[c].DYNAMIC
segments contain dynamic linking information.-
NOTE
segments contain, you guessed it, notes about the file. The GNU linker uses certain.note
sections to get information about the file. GNU_EH_FRAME
contains exception unwinding information.-
GNU_STACK
tells the kernel how to handle the stack (e.g. if it needs to be executable). GNU_RELRO
marks sections that should be made read-only after being loaded.
A NOTE
segment is the best (read: only) choice we can make here, but how do we choose which
one?
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame .gcc_except_table .note.android.ident
01 .init_array .fini_array .data.rel.ro .dynamic .got .data cfstring .bss
02 .dynamic
03 .note.gnu.build-id
04 .note.android.ident
05 .eh_frame_hdr
06
07 .init_array .fini_array .data.rel.ro .dynamic .got
.note.gnu.build-id
seems the safest bet here, as that's only used to provide a unique
identifier for the binary. So my new process was:
- Assemble my patch and dump the
.text
section -
Create a new section,
.patch
, in the target ELF file with the dumped.text
section -
Modify the first
NOTE
segment to be aLOAD
segment and point it to the.patch
section -
Modify the
.bss
section and its containing segment to be 8 bytes larger, so the patch can store its state there. - Modify an instruction at the right address to jump to my code
- (Repackage APK, etc.)
Modifying the segment header is the meat of this process. Here is the segment header definition, from
elf(5)
:
// typedef uint64_t Elf64_Off
// typedef uint64_t Elf64_Addr
typedef struct {
uint32_t p_type;
uint32_t p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
uint64_t p_filesz;
uint64_t p_memsz;
uint64_t p_align;
} Elf64_Phdr;
-
p_type
is the type of segment. This should be set toPT_LOAD
, which is1
. -
p_flags
holds RWX flags for the segment. This should be set to5
(R+X). -
p_offset
is the start of the segment in the file.objcopy
always adds the new section at0x5d0000
, so that's what we'll set this to. -
p_vaddr
is where the segment should be mapped to in process memory. Because the secondLOAD
segment has ap_memsz
larger than itsp_filesz
, we set this to0x5e0000
to be safe. p_paddr
is used for physical addressing, which is irrelevant here.-
p_filesz
describes the size of the segment in the file, which in this case is the size of the patch. -
p_memsz
describes how much memory this segment needs, which is again the size of the patch. -
p_align
is the alignment this segment needs. It's stored as a power of two. We just copy from the firstLOAD
segment and set this to1
.
One problem: as far as I can tell, objcopy
doesn't support modifying the program table. In
order to solve this problem and avoid modifying it by hand, I wrote
a small C program capable of doing so.
My final build script looks like this:
#!/usr/bin/bash
as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy file.so --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out
./modelf patch.out \
--segment 3 \
--type 1 \
--offset 0x5d0000 \
--vaddr 0x5e0000 \
--paddr 0x5e0000 \
--filesz $(stat -c '%s' patch.text) \
--memsz $(stat -c '%s' patch.text) \
--align 1 \
--flags 0x5 \
--section 26 \
--addr 0x5e0000 \
--segment 1 \
--memsz 0x3c270 \
--section 24 \
--size 0x1e50
./patch-binary.sh modelf-out.elf patchfile.pf
And there you have it! This method produces a working, patched shared object ready to be loaded by the Android app.
If the ELF file didn't happen to have a spare NOTE
segment, I would have needed to do
something much uglier. I plan on experimenting with adding new segments in the future.
a. ^ I plan to do a writeup of this project in the future, but until then, I won't give specifics on it (for legal reasons, if nothing else).
b. ^ Of course, I had to fix bugs after successfully patching it in, but the process of writing the patch and fixing the bugs therein are outside the scope of this post and are better suited for the aforementioned planned future writeup.
c. ^
Only sections in LOAD
segments are mapped to memory, but other segments can also contain
those sections (e.g. .note.gnu.build-id
here).