Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation unit detection #65

Open
ghost opened this issue Jul 24, 2021 · 4 comments
Open

Translation unit detection #65

ghost opened this issue Jul 24, 2021 · 4 comments
Labels
decompiler Improvements to decompiler tooling devops Changes to the build system and CI p-high

Comments

@ghost
Copy link

ghost commented Jul 24, 2021

Most of the unresearched code currently sits in a handful of large assembly blobs.
These blobs contain lots of unrelated pieces of code. We need to improve structuring.

A basic improvement is to recover the original translation unit slices and generate C inline ASM files for each TU.

The CodeWarrior build system leaks some information on TU structure.
Examples:

  • Data sections of a TU (especially small data) are aligned and padded. Hint: Padding detected (i.e. no xrefs) and next piece of data is aligned
  • Strings and floating point literals are deduplicated within a TU. Hint: The TU boundary has to be between two copies of the same data.
@ghost ghost added enhancement p-high devops Changes to the build system and CI labels Jul 24, 2021
@ghost ghost self-assigned this Jul 24, 2021
@riidefi
Copy link
Owner

riidefi commented Jul 24, 2021

Some more clues:

  • The majority of data is not shared across TUs
  • Non-SDA data loads are typically done as first_tu_data + (data - first_tu_data). Example:
    .rel.text1:806DD3A8 addi r30, r30, aMashballoongc@l # "MashBalloonGC"
    .rel.text1:806DD3AC addi r4, r30, (aHeyhoshipgba_0 - 0x808A0420) # "HeyhoShipGBA"
    .rel.text1:806DD3B0 bl strcmp

@ghost ghost removed their assignment Jul 24, 2021
@ghost ghost added the decompiler Improvements to decompiler tooling label Jul 24, 2021
@riptl
Copy link
Collaborator

riptl commented Mar 19, 2022

Resuming work on this. To begin with, I'm going to export all symbols, XREFs, etc, from @stblr's Ghidra using https://github.com/r0metheus/GhiDump
This should get us off the ground with the sdata2 float dedup heuristic.

@riptl
Copy link
Collaborator

riptl commented Mar 27, 2022

First attempt at translation unit detection using the sdata2 heuristic has been successful (well, kinda?).

File format is

<SDATA2_START>..<SDATA2_STOP> <TEXT_START>..<TEXT_STOP>

Please note that the detected text TUs only set the minimum span. They are always greater in practice.

sdata_detect_attempt.txt

@riidefi
Copy link
Owner

riidefi commented Mar 28, 2022

Nice work! I think for the time being, we can fairly easily do .text splits using the symbol map. If the script could then autogenerate the data splits, that would be really convenient.

@riptl riptl removed the enhancement label Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decompiler Improvements to decompiler tooling devops Changes to the build system and CI p-high
Projects
None yet
Development

No branches or pull requests

2 participants