Skip to content

Binary Disassembly

Allison Mackenzie edited this page Oct 16, 2023 · 32 revisions

Although it was the goal of the administrators to disassemble all of the game's binaries long before having everyone rush to match functions, it is believed that the information contained here is useful to those of you who are curious how we went about this process. This information can be used for many of your favorite games.

Adding the Binary to the Makefile

The first thing done is adding the binary to the Makefile. Let's take the overlay Pumpkin Gorge as an example. Pumpkin Gorge is obviously shortened to PG/pg.

Note: For this section, you'll want to ignore the *'s in the actual Makefile as this is added here only in the wiki for emphasis!

The following can be added at the top in the appropriate place.

OVL_PG			:= pg

In the case of an overlay, you can find the overlays instruction and add it as part of the necessary overlays (this is obviously not necessary in cases of things which are not overlays).

overlays: ac ag cc ch cr credits dc ee eh gg gs gy1 gy2 hh hr ia la landmap pd **pg** ps sf sv td tl zl

A little bit further down in the make file you should add the instruction for that particular overlay.

pg: ovlpg_dirs $(BUILD_DIR)/PG.BIN
$(BUILD_DIR)/PG.BIN: $(BUILD_DIR)/ovlpg.elf
	$(OBJCOPY) -O binary $< $@

Next, you'll add it to the extract instruction.

extract: extract_main extract_game extract_ovlac extract_ovlag extract_ovlcc extract_ovlch extract_ovlcr extract_ovlcredits extract_ovldc extract_ovlee extract_ovleh extract_ovlgy1 extract_ovlgy2 extract_ovlhh extract_ovlhr extract_ovlia extract_ovlla extract_ovllandmap extract_ovlpd **extract_ovlpg** extract_ovlps extract_ovlsf extract_ovlsv extract_ovltd extract_ovltl extract_ovlzl

Finally, you'll want to add it to the phony at the bottom of the make file.

.PHONY: main game ac ag cc ch cr credits dc ee eh gg gs gy1 gy2 hh hr ia la landmap pd **pg** ps sf sv td tl zl

Creating a Splat Config File

Inside your root directory, you'll find a folder/directory called config which contains a few files that start with the word splat. There's 2 ways to make the appropriate file.

Copy and Pasting

If you're working with a project which already contains a splat yaml files, then copy/pasting a new one, then renaming it appropriately, and editing the contents to fit your needs is 100% a valid option. It's what we did for most of the overlays in MediEvil Decomp!

Using Splat To Generate One For You

You can also using Splat directly to do that. Instructions on how to do that will be added here soon!

Creating a Symbols File

Also inside your config folder/directory, you'll find some txt files which start with the word symbols. You'll need to add one for the binary you're trying to disassemble. For example, let's say we're doing Zarok's lair, the file would be called symbols.ovlzl.txt.

medievil.check.sha

Inside of the config directory, there is a file called medievil.check.sha which contains the sha1sum value of the different binaries. This is the same value that you will generate for the splat config sha1 property, which is described in the next section. To generate a sha1sum you will need to type sva1sum <path to file>, then add it to the file. The MediEvil Decompilation project handles this by order of importance being Main > Game > Overlays, then in alphabetical order, although this isn't necessary.

Modifying the Splat Config File

The Splat Config file is the gateway of the extraction process from the binary to MIPS assembly and C code (as well as rodata and data). Let's take a look at Sleeping Village (SV/sv).

name: SV.BIN
sha1: 7a5b3a3f15a61ebb69e1fe98893a7fa9e3289407
options:
  platform: psx
  basename: ovlsv
  base_path: ..
  build_path: build/
  target_path: disk/OVERLAYS/SV.BIN
  asm_path: asm/ovl/sv
  asset_path: assets/ovl/sv
  src_path: src/ovl/sv
  compiler: GCC
  symbol_addrs_path: config/symbols.ovlsv.txt
  undefined_funcs_auto_path: config/undefined_funcs_auto.ovlsv.txt
  undefined_syms_auto_path: config/undefined_syms_auto.ovlsv.txt
  ld_script_path: config/ld/ovlsv.ld
  find_file_boundaries: yes
  use_legacy_include_asm: no
  migrate_rodata_to_functions: yes
  asm_jtbl_label_macro: jlabel
  section_order:
    - ".rodata"
    - ".text"
    - ".data"

  subalign: 2

  rodata_string_guesser_level: 2
  data_string_guesser_level: 2

segments:
  - name: ovlsv
    type: code
    start: 0x00000000
    vram: 0x80010000
    subsegments:
      - [0x0, rodata]
      - [0xD8, c]
      - [0x72D0, data]
  - [0xD214]

The Different Properties

Note: This section is still a WIP, sorry!

  • name: The binary (SV.BIN).
  • sha1: The sha1sum, to find this type sva1sum ./disk/OVERLAYS/SV.BIN in the root directory in a terminal.
  • options: Added options for Splat.
  • platform: The console/platform the game is based on, in MediEvil's case it is on the PlayStation 1, so psx.
  • basename: ovlsv, meaning overlay Sleeping Village.
  • base_path: .. - meaning the root or where to start the traversal for other options.

Finding The Size Of the Binary

The process of finding the size of the binary is quite simple. You will need to use a hex editor program, such as HxD, to open the binary then scroll all the way to the bottom and find the very last address. Using Pumpkin Gorge again as an example, 0x94DF is the final value, so the size is 0x94E0 because you need to add 1 bit to it. Make sure to read the next section to understand why.

Size Must Be 0 Aligned

Under the segment property, the size of the overlay (which is indicated by the line - [0x0000] must end in a multiple of 4 (0, 4, 8, or C in hexadecimal). , the nearest 0 Aligned value is 0x94E0 which is the correct size.

Finding The Different Subsegments

Note that in MediEvil, 0x0 is always the start of rodata, however, this isn't always true for every console/game (possibly being dependent on the compiler). Using Pumpkin Gorge again, at some point, you find actual code, and then data.

disasm_unknown: true

You can add disasm_unknown: true as an option to the YAML configuration file to force all data to be extracted as assembly. So, under subsegments, if you change - [0x0, rodata] to - [0x0, asm] while having this property set to true will force the extracted assembly to show pure assembly (under the asm directory) so that you can tell what is rodata, actual functions, and data.

Data Segments

This Wikipedia article may be of help: https://en.wikipedia.org/wiki/Data_segment

Rodata

Rodata is a segment of memory which contains static constants, read-only data to put it simple.

jr / 0800E003

The MIPS instruction jr is typically the end of a function, as denoted by 0800E003. You can search for the first instance of 0800E003 to find the end of the first function. Of course, you need to find the start of the function! To do that, you'll need to find the spot where rodata ends and where the functions start.

ASM / C

Assembly is the low level language used by the PlayStation, C code is a higher level language which is what the original developers wrote the code in.

Where Do I Find The Bridge Point?

You'll need to scroll up in the assembly file you generated until you find a section of memory which is the start of the first function for the binary. Can you find where that is? In Pumpkin Gorge's case, it is C8. How do we know? C8 is an addiu instruction. Before it, at C4, is a nop. Above it is a long strand of lb. If you scroll up more, you might start seeing # INVALID, and # handwritten instruction. Once you have this value, you should add it to the config file. It should currently look like this:

segments:
  - name: ovlpg
    type: code
    start: 0x00000000
    vram: 0x80010000
    subsegments:
      - [0x0, rodata]
      - [0xC8, asm]
  - [0x94E0]

Data

This is the segment of data which is for the other types of BSS Data.

Finding the Next Bridge Point?

Now, you should save it and run make init again. You'll have a new assembly file, C8.s. Inside of C8.s, scroll down until you find the first function with # INVALID in it, the start address is where data starts. In Pumpkin Gorge's case, this is 0x7310. Note that # Handwritten Function does not necessary mean it is wrong, this could just be Splat not recognizing correct MIPS opcodes. Opcodes being things like jal, nop, bne, etc. Your splat config should now look like this:

segments:
  - name: ovlpg
    type: code
    start: 0x00000000
    vram: 0x80010000
    subsegments:
      - [0x0, rodata]
      - [0xC8, asm]
      - [0x7310, data]
  - [0x94E0]

If you do a make init and you get an OK, change the asm to c and remove the disasm_unknown: true property. Do another make init and make sure that you generated the correct files in the src directory. If everything is great, then good job you successfully extracted a binary!

Your final segments property should look like this:

segments:
  - name: ovlpg
    type: code
    start: 0x00000000
    vram: 0x80010000
    subsegments:
      - [0x0, rodata]
      - [0xC8, c]
      - [0x7310, data]
  - [0x94E0]