Skip to content

General Workflow

Anghelo edited this page Aug 23, 2024 · 6 revisions

General workflow

This describes an example of how to iteratively edit the splat segments config in order to maximise code and data migration from the binary.

1 Initial configuration

After successfully following the Quickstart, you should have an initial configuration like the one below:

- name: main
  type: code
  start: 0x1060
  vram: 0x80070C60
  follows_vram: entry
  bss_size: 0x3AE70
  subsegments:
      - [0x1060, asm]
      # ... a lot of additional `asm` sections
      # This section is found out to contain __osViSwapContext
      - [0x25C20, asm, energy_orb_wave]
      # ... a lot of additional `asm` sections
      - [0x2E450, data]

      - [0x3E330, rodata]
      # ... a lot of additional `rodata` sections
      - { start: 0x3F1B0, type: bss, vram: 0x800E9C20 }

- [0x3F1B0, bin]

1.1 Match rodata to asm sections

It's good practice to start pairing rodata sections with asm sections before changing the asm sections into c files. This is because rodata may need to be explicitly included within the c file (via INCLUDE_RODATA or GLOBAL_ASM macros).

splat provides hints about which rodata segments are referenced by which asm segments based on references to these symbols within the disassembled functions.

These messages are output when splitting and look like:

Rodata segment '3EE10' may belong to the text segment 'energy_orb_wave'
    Based on the usage from the function func_0xXXXXXXXX to the symbol D_800AEA10

To pair these two sections, simply add the name of the suggested text (i.e. asm) segment to the rodata segment:

- [0x3EE10, rodata, energy_orb_wave] # segment will be paired with a text (i.e. asm or c) segment named "energy_orb_wave"

NOTE:

By default migrate_rodata_to_functions functionality is enabled. This causes splat to include paired rodata along with the disassembled assembly code, allowing it to be linked via .rodata segments from the get-go. This guide assumes that you will disable this functionality until you have successfully paired up the segments.

Troubleshooting

Multiple rodata segments for a single text segment

Using the following configuration:

# ...
- [0x3E900, rodata]
- [0x3E930, rodata]
# ...

splat outputs a hint that doesn't immediately seem to make sense:

Rodata segment '3E900' may belong to the text segment '16100'
    Based on the usage from the function func_80085DA0 to the symbol jtbl_800AE500

Rodata segment '3E930' may belong to the text segment '16100'
    Based on the usage from the function func_800862C0 to the symbol jtbl_800AE530

This hint tells you that splat believes one text segment references two rodata sections. This usually means that either the rodata should not be split at 0x3E930, or that there is a missing split in the asm at 0x16100, as a text segment can only have one rodata segment.

If we assume that the rodata split is incorrect, we can remove the extraneous split:

# ...
- [0x3E900, rodata, "16100"]
# ...

NOTE: Splat uses heuristics to determine rodata and asm splits and is not perfect - false positives are possible and, if in doubt, double-check the assembly yourself before changing the splits.

Multiple asm segments referring to the same rodata segment

Sometimes the opposite is true, and splat believes two asm segments belong to a single rodata segment. In this case, you can split the asm segment to make sure two files are not paired with the same rodata. Note that this too can be a false positive.

2 Disassemble text, data, rodata

Let's say you want to start decompiling the subsegment at 0x25C20 (energy_orb_wave). Start by replacing the asm type with c, and then re-run splat.

- [0x25C20, c, energy_orb_wave]
# ...
- [0x3EE10, rodata, energy_orb_wave]

This will disassemble the ROM at 0x25C20 as code, creating individual .s files for each function found. The output will be located in {asm_path}/nonmatchings/energy_orb_wave/<function_name>.s.

Assuming data and rodata segments have been paired with the c segment, splat will generate {asm_path}/energy_orb_wave.data.s and {asm_path}/energy_orb_wave.rodata.s respectively.

Finally, splat will generate a C file, at {src_path}/energy_orb_wave.c containing macros that will be used to include all disassembled function assembly.

NOTE:

  • the path for where assembly is written can be configured via asm_path, the default is {base_dir}/asm
  • the source code path can be configured via src_path, the default is {base_path}/src

Macros

The macros to include text/rodata assembly are different for GCC vs IDO compiler:

GCC: INCLUDE_ASM & INCLUDE_RODATA (text/rodata respectively) IDO: GLOBAL_ASM

These macros must be defined in an included header, which splat currently does not produce.

For a GCC example, see the include.h from the Dr. Mario project.

For IDO, you will need to use asm-processor in order to include assembly code within the c files.

Assembly macros

splat relies on some assembly macros for the asm generation. They usually live on the include/macro.inc file. Without these macros then an assembler would not be able to build our disassemblies.

Those macros usually look like this:

.macro glabel label
    .global \label
    .type \label, @function
    \label:
.endm

.macro dlabel label
    .global \label
    \label:
.endm

.macro jlabel label
    .global \label
    \label:
.endm

Where glabel is used for functions, dlabel is used for data, rodata and bss variables and jlabel is used for branch labels used by jumptables.

Asm differ tools can sometimes struggle to show diffs with jlabels when combined with certain compilers. A workaround for this issue is to mark the jlabel as a function, like this:

.macro jlabel label
    .global \label
    .type \label, @function
    \label:
.endm

Float assembly macros

Additionally splat recommends using the o32 abi names for float registers, which gives proper names to the float registers.

For a proper explanation on what those abi names are and why they are recommended check this: https://gist.github.com/EllipticEllipsis/27eef11205c7a59d8ea85632bc49224d

Some compilers/assemblers have support for them but others do not, if your compiler doesn't support them then but does support having custom register aliases (like the modern mips-linux-gnu-as and similar assemblers) then it is recommended to add the following to your macro.inc file:

# Float register aliases (o32 ABI, odd ones are rarely used)

.set $fv0,          $f0
.set $fv0f,         $f1
.set $fv1,          $f2
.set $fv1f,         $f3
.set $ft0,          $f4
.set $ft0f,         $f5
.set $ft1,          $f6
.set $ft1f,         $f7
.set $ft2,          $f8
.set $ft2f,         $f9
.set $ft3,          $f10
.set $ft3f,         $f11
.set $fa0,          $f12
.set $fa0f,         $f13
.set $fa1,          $f14
.set $fa1f,         $f15
.set $ft4,          $f16
.set $ft4f,         $f17
.set $ft5,          $f18
.set $ft5f,         $f19
.set $fs0,          $f20
.set $fs0f,         $f21
.set $fs1,          $f22
.set $fs1f,         $f23
.set $fs2,          $f24
.set $fs2f,         $f25
.set $fs3,          $f26
.set $fs3f,         $f27
.set $fs4,          $f28
.set $fs4f,         $f29
.set $fs5,          $f30
.set $fs5f,         $f31

If even this doesn't work on your assembler then you would need to disable those abi names by setting the mips_abi_float_regs option in your yaml to numeric.

Old GCC builds (like KMC) can struggle with register aliases, a workaround is to split the macro labels and the aliases in two different files. You can follow the example from Dr. Mario: labels.inc and macro.inc

3 Decompile text

This involved back and forth between .c and .s files:

  • editing the data.s, rodata.s files to add/fixup symbols at the proper locations
  • decompiling functions, declaring symbols (externs) in the .c

The linker script links

  • .text (only) from the .o built from energy_orb_wave.c
  • .data (only) from the .o built from energy_orb_wave.data.s
  • .rodata (only) from the .o built from energy_orb_wave.rodata.s

4 Decompile (ro)data

Migrate data to the .c file, using raw values, lists or structs as appropriate code.

Once you have paired the rodata and text segments together, you can enabled migrate_rodata_to_functions. This will add the paired rodata into each individual function's assembly file, and therefore, the rodata will end up in the compiled .o file.

To link the .data/.rodata from the .o built from the .c file (instead of from the .s files), the subsegments must be changed from:

- [0x42100, c, energy_orb_wave]
- [0x42200, data, energy_orb_wave]     # extract data at this ROM address as energy_orb_wave.data.s
- [0x42300, rodata, energy_orb_wave]   # extract rodata at this ROM address as energy_orb_wave.rodata.s

to:

- [0x42100, c, energy_orb_wave]
- [0x42200, .data, energy_orb_wave]    # take the .data section from the compiled c file named energy_orb_wave
- [0x42300, .rodata, energy_orb_wave]  # take the .rodata section from the compiled c file named energy_orb_wave

NOTE: If using auto_link_sections and data is fully migrated, the subsegments can also be changed to the following and splat will add the appropriate entries into the linker script.

- [0x42100, c, energy_orb_wave]
- [0x42200]

5 Decompile bss

bss works in a similar way to data/rodata. However, bss is usually discarded from the final binary, which makes it somewhat tricker to migrate.

The bss segment will create assembly files that are full of space. The .bss segment will link the .bss section of the referenced c file.

6 Done!

.text, .data, .rodata and .bss are linked from the .o built from energy_orb_wave.c which now has everything to match when building

The assembly files (functions .s, data.s and rodata.s files) can be deleted