-
Notifications
You must be signed in to change notification settings - Fork 6
Binary Disassembly
Although it was the goal of the administrators to disassemble all of the game's binaries long before having everyone rush to match functions, it is believed that the information contained here is useful to those of you who are curious how we went about this process. This information can be used for many of your favorite games.
The first thing done is adding the binary to the Makefile. Let's take the overlay Pumpkin Gorge as an example. Pumpkin Gorge is obviously shortened to PG/pg.
Note: For this section, you'll want to ignore the *'s in the actual Makefile as this is added here only in the wiki for emphasis!
The following can be added at the top in the appropriate place.
OVL_PG := pg
In the case of an overlay, you can find the overlays instruction and add it as part of the necessary overlays (this is obviously not necessary in cases of things which are not overlays).
overlays: ac ag cc ch cr credits dc ee eh gg gs gy1 gy2 hh hr ia la landmap pd **pg** ps sf sv td tl zl
A little bit further down in the make file you should add the instruction for that particular overlay.
pg: ovlpg_dirs $(BUILD_DIR)/PG.BIN
$(BUILD_DIR)/PG.BIN: $(BUILD_DIR)/ovlpg.elf
$(OBJCOPY) -O binary $< $@
Next, you'll add it to the extract instruction.
extract: extract_main extract_game extract_ovlac extract_ovlag extract_ovlcc extract_ovlch extract_ovlcr extract_ovlcredits extract_ovldc extract_ovlee extract_ovleh extract_ovlgy1 extract_ovlgy2 extract_ovlhh extract_ovlhr extract_ovlia extract_ovlla extract_ovllandmap extract_ovlpd **extract_ovlpg** extract_ovlps extract_ovlsf extract_ovlsv extract_ovltd extract_ovltl extract_ovlzl
Finally, you'll want to add it to the phony at the bottom of the make file.
.PHONY: main game ac ag cc ch cr credits dc ee eh gg gs gy1 gy2 hh hr ia la landmap pd **pg** ps sf sv td tl zl
Inside your root directory, you'll find a folder/directory called config which contains a few files that start with the word splat. There's 2 ways to make the appropriate file.
If you're working with a project which already contains a splat yaml files, then copy/pasting a new one, then renaming it appropriately, and editing the contents to fit your needs is 100% a valid option. It's what we did for most of the overlays in MediEvil Decomp!
You can also using Splat directly to do that. Instructions on how to do that will be added here soon!
Also inside your config folder/directory, you'll find some txt files which start with the word symbols. You'll need to add one for the binary you're trying to disassemble. For example, let's say we're doing Zarok's lair, the file would be called symbols.ovlzl.txt
.
Inside of the config directory, there is a file called medievil.check.sha
which contains the sha1sum value of the different binaries. This is the same value that you will generate for the splat config sha1 property, which is described in the next section. To generate a sha1sum you will need to type sva1sum <path to file>
, then add it to the file. The MediEvil Decompilation project handles this by order of importance being Main > Game > Overlays, then in alphabetical order, although this isn't necessary.
The Splat Config file is the gateway of the extraction process from the binary to MIPS assembly and C code (as well as rodata and data). Let's take a look at Sleeping Village (SV/sv).
name: SV.BIN
sha1: 7a5b3a3f15a61ebb69e1fe98893a7fa9e3289407
options:
platform: psx
basename: ovlsv
base_path: ..
build_path: build/
target_path: disk/OVERLAYS/SV.BIN
asm_path: asm/ovl/sv
asset_path: assets/ovl/sv
src_path: src/ovl/sv
compiler: GCC
symbol_addrs_path: config/symbols.ovlsv.txt
undefined_funcs_auto_path: config/undefined_funcs_auto.ovlsv.txt
undefined_syms_auto_path: config/undefined_syms_auto.ovlsv.txt
ld_script_path: config/ld/ovlsv.ld
find_file_boundaries: yes
use_legacy_include_asm: no
migrate_rodata_to_functions: yes
asm_jtbl_label_macro: jlabel
section_order:
- ".rodata"
- ".text"
- ".data"
subalign: 2
rodata_string_guesser_level: 2
data_string_guesser_level: 2
segments:
- name: ovlsv
type: code
start: 0x00000000
vram: 0x80010000
subsegments:
- [0x0, rodata]
- [0xD8, c]
- [0x72D0, data]
- [0xD214]
Note: This section is still a WIP, sorry!
- name: The binary (SV.BIN).
- sha1: The sha1sum, to find this type
sva1sum ./disk/OVERLAYS/SV.BIN
in the root directory in a terminal. - options: Added options for Splat.
- platform: The console/platform the game is based on, in MediEvil's case it is on the PlayStation 1, so psx.
- basename: ovlsv, meaning overlay Sleeping Village.
- base_path: .. - meaning the root or where to start the traversal for other options.
The process of finding the size of the binary is quite simple. You will need to use a hex editor program, such as HxD, to open the binary then scroll all the way to the bottom and find the very last address. Using Pumpkin Gorge again as an example, 0x94DF is the final value, so the size is 0x94E0 because you need to add 1 bit to it. Make sure to read the next section to understand why.
Under the segment property, the size of the overlay (which is indicated by the line - [0x0000] must end in a multiple of 4 (0, 4, 8, or C in hexadecimal). , the nearest 0 Aligned value is 0x94E0 which is the correct size.
Note that in MediEvil, 0x0 is always the start of rodata, however, this isn't always true for every console/game (possibly being dependent on the compiler). Using Pumpkin Gorge again, at some point, you find actual code, and then data.
You can add disasm_unknown: true
as an option to the YAML configuration file to force all data to be extracted as assembly. So, under subsegments, if you change - [0x0, rodata]
to - [0x0, asm]
while having this property set to true will force the extracted assembly to show pure assembly (under the asm directory) so that you can tell what is rodata, actual functions, and data.
This Wikipedia article may be of help: https://en.wikipedia.org/wiki/Data_segment
Rodata is a segment of memory which contains static constants, read-only data to put it simple.
The MIPS instruction jr
is typically the end of a function, as denoted by 0800E003. You can search for the first instance of 0800E003 to find the end of the first function. Of course, you need to find the start of the function! To do that, you'll need to find the spot where rodata ends and where the functions start.
Assembly is the low level language used by the PlayStation, C code is a higher level language which is what the original developers wrote the code in.
You'll need to scroll up in the assembly file you generated until you find a section of memory which is the start of the first function for the binary. Can you find where that is? In Pumpkin Gorge's case, it is C8. How do we know? C8 is an addiu instruction. Before it, at C4, is a nop. Above it is a long strand of lb. If you scroll up more, you might start seeing # INVALID
, and # handwritten instruction
. Once you have this value, you should add it to the config file. It should currently look like this:
segments:
- name: ovlpg
type: code
start: 0x00000000
vram: 0x80010000
subsegments:
- [0x0, rodata]
- [0xC8, asm]
- [0x94E0]
This is the segment of data which is for the other types of BSS Data.
Now, you should save it and run make init
again. You'll have a new assembly file, C8.s. Inside of C8.s, scroll down until you find the first function with # INVALID
in it, the start address is where data starts. In Pumpkin Gorge's case, this is 0x7310. Note that # Handwritten Function
does not necessary mean it is wrong, this could just be Splat not recognizing correct MIPS opcodes. Opcodes being things like jal, nop, bne, etc. Your splat config should now look like this:
segments:
- name: ovlpg
type: code
start: 0x00000000
vram: 0x80010000
subsegments:
- [0x0, rodata]
- [0xC8, asm]
- [0x7310, data]
- [0x94E0]
If you do a make init
and you get an OK, change the asm to c and remove the disasm_unknown: true
property. Do another make init
and make sure that you generated the correct files in the src directory. If everything is great, then good job you successfully extracted a binary!
Your final segments property should look like this:
segments:
- name: ovlpg
type: code
start: 0x00000000
vram: 0x80010000
subsegments:
- [0x0, rodata]
- [0xC8, c]
- [0x7310, data]
- [0x94E0]