-
-
Notifications
You must be signed in to change notification settings - Fork 122
psyq-obj-parser: Handle comm symbols properly #1907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
psyq-obj-parser: Handle comm symbols properly #1907
Conversation
WalkthroughThe changes enhance the Changes
Suggested Reviewers
Possibly Related PRs
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Bit busy this week, I'll look at the various PRs over the week-end. |
This is technically the proper way to handle these and what GCC would normally generate. However, for in-progress game decompilations, tools like splat expect |
There were 2 other branches linked in the first post which made it available behind a command-line flag. One of them should probably be merged instead of the branch assigned to the PR, so the change can still be opted out of.
This is intended. A translation unit using common symbols do not have the ability to specify the ordering of common symbols by design. Instead, it's the linker's responsibility to order them. This behavior is consistent with the original ASPSX.EXE + PSYLINK.EXE behavior. That may sound like you're losing functionality, but it's actually the opposite: this gives more control now. But the responsibility for using this properly is now on each decomp project instead of psyq-obj-parser. The advantage to this is that symbols from the same object don't need to be grouped together anymore, they can be freely moved around, even amongst the common symbols of other objects. This can be done by creating a single .S file which defines all common symbols from all translation units, in whichever order you'd like. (Example) |
Yeah I saw those, that's why I made the suggestion it would be better to go with one of those. Once a given project is finally in a "shiftable" state, then symbol order isn't really a problem. But in the long interim of reversing a game, I find it super helpful to be able to finish a file, have all functions/variables in it and moved out of the asm stuff, and then compile that file to an OBJ, convert to elf, and produce a bin-exact executable. Of course, once this has been done for every file and they can finally be linked with psylink rather than converting to elfs, they will likely scramble again as you mention, the ordering is done by hash, so it would be extremely hard to get them right short of just guessing/appending junk onto names to get the hash in the right order. |
Thanks for testing!
Can you upload an object file where this happens somewhere, maybe I accidentally broke something unrelated. |
You technically didn't break anything, as @Kneesnap already mentioned, this is the correct/expected behavior of communal definitions. Multiple files could technically reference the same definition, but the linker has the freedom to move them around and place them how it wants. This is how an obj is properly converted, but has the downside that when decompiling, you often dont have the original variable names, and since COMMs appear to be sorted by hash, this makes it impossible to get them in the right order. Whereas, when they sit within the bss sections, it simply "copies" the bss section out of the elf as a solid block, so everything stays in the same order even if the names are incorrect. |
Convert .comm symbols into the proper ELF equivalent, instead of converting them to local symbol in the .bss segment. This makes it possible use the converted PSY-Q libraries to recreate a bit-exact executable of Frogger - without proper comm symbol support controlling the order of some .bss symbols in the libraries is impossible. Even though this is more correct than before there are some known users of psyq-obj-parser that are relying on the old behaviour thus a command line option is added which can be used to restore the previous behaviour.
e7c7796
to
36f2288
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tools/psyq-obj-parser/psyq-obj-parser.cc (1)
248-508
: Consider breaking down the largeparse
method in a future refactor.The
parse
method is becoming increasingly complex (now at 64 cyclomatic complexity, threshold 9) as indicated by the static analysis tools. While the current changes are necessary and well-implemented, consider refactoring this method in the future into smaller, more focused functions to improve maintainability.🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/psyq-obj-parser/psyq-obj-parser.cc
(10 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
tools/psyq-obj-parser/psyq-obj-parser.cc
[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
[warning] 980-982: ❌ Getting worse: Complex Method
PsyqLnkFile::Symbol::generateElfSymbol increases in cyclomatic complexity from 9 to 10, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
[warning] 1399-1400: ❌ Getting worse: Complex Method
PsyqLnkFile::Relocation::generateElf increases in cyclomatic complexity from 62 to 63, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (9)
tools/psyq-obj-parser/psyq-obj-parser.cc (9)
127-129
: Updated API signature adds flag to control COMM symbol handling.The
parse
method now accepts an additional boolean parameterconvertCommToBss
that controls whether COMM symbols should be converted to BSS symbols, preserving backward compatibility while adding new functionality.
168-168
: Added new symbol type for COMM symbols.The new
COMM
enum value in theSymbol::Type
enumeration allows the proper representation of common symbols, which is essential for correct ELF generation.
177-177
: Updated offset calculation to handle COMM symbols.The condition now properly handles both
UNINITIALIZED
andCOMM
symbol types for offset calculations, ensuring consistent behavior.
297-297
: Excluded COMM symbols from BSS placement calculation.This change ensures COMM symbols are not processed in the BSS symbol placement loop, which is correct since they should be handled differently than regular BSS symbols.
504-508
: Core implementation for COMM symbol handling.The code now conditionally assigns the symbol type based on the
convertCommToBss
flag, allowing users to choose between the old behavior (treating common symbols as BSS) and the new behavior (preserving them as COMM symbols).🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
980-982
: Set correct ELF section index for COMM symbols.COMM symbols are now properly assigned to the
ELFIO::SHN_COMMON
section, which is the standard ELF representation for common symbols.🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
[warning] 980-982: ❌ Getting worse: Complex Method
PsyqLnkFile::Symbol::generateElfSymbol increases in cyclomatic complexity from 9 to 10, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
1399-1400
: Skip section-specific processing for COMM symbols.The relocation generation now properly excludes COMM symbols from section-specific processing, as these symbols don't belong to a specific section.
🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
[warning] 1399-1400: ❌ Getting worse: Complex Method
PsyqLnkFile::Relocation::generateElf increases in cyclomatic complexity from 62 to 63, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
1549-1549
: Updated help text with new command line option.The help text now includes information about the
-c
option, which allows users to control the conversion of COMM symbols to BSS symbols.
1565-1565
: Connected command line flag to internal code.The command line option
-c
is now properly passed to theparse
method, enabling users to control the COMM symbol handling behavior.
Sorry, got hit hard by covid. I'm processing my backlog slowly now. |
No problem! |
Convert .comm symbols into the proper ELF equivalent, instead of converting them to local symbol in the .bss segment.
This makes it possible use the converted PSY-Q libraries to recreate a bit-exact executable of Frogger - without proper comm symbol support controlling the order of some .bss symbols in the libraries is impossible.
Even though this is more correct than before there are some known users of psyq-obj-parser are relying on the old behaviour thus a command line option is added which can be used to restore the previous behaviour.
(Alternate variant at main...dezgeg:psyq-obj-parser-comm-support-variant1 but I think this one looks clearer).