Skip to content

psyq-obj-parser: Handle comm symbols properly #1907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dezgeg
Copy link

@dezgeg dezgeg commented Mar 17, 2025

Convert .comm symbols into the proper ELF equivalent, instead of converting them to local symbol in the .bss segment.

This makes it possible use the converted PSY-Q libraries to recreate a bit-exact executable of Frogger - without proper comm symbol support controlling the order of some .bss symbols in the libraries is impossible.

Even though this is more correct than before there are some known users of psyq-obj-parser are relying on the old behaviour thus a command line option is added which can be used to restore the previous behaviour.

(Alternate variant at main...dezgeg:psyq-obj-parser-comm-support-variant1 but I think this one looks clearer).

Copy link
Contributor

coderabbitai bot commented Mar 17, 2025

Walkthrough

The changes enhance the PsyqLnkFile class by modifying the parse method to include a new parameter, convertCommToBss, which allows for the conversion of COMM symbols to BSS symbols. A new symbol type, COMM, is introduced in the Symbol struct, affecting how offsets are calculated. The generateElfSymbol method is updated to handle the COMM type, and the generateElf method in the Relocation struct is adjusted to exclude COMM symbols from certain operations.

Changes

Files Change Summary
tools/psyq-obj-parser/psyq-obj-parser.cc - Added convertCommToBss parameter to parse method.
- Introduced COMM symbol type in Symbol struct.
- Updated offset calculations to include COMM type.
- Modified generateElfSymbol to set elfSectionIndex for COMM symbols.
- Adjusted generateElf in Relocation to exclude COMM symbols from certain operations.

Suggested Reviewers

  • nicolasnoble

Possibly Related PRs

Poem

In the code's twilight, I hop with glee,
Adjusting symbols as sprightly as can be.
BSS logic now dances away in the night,
Uninitialized symbols gleam with new light.
Hopping through changes with a twitch and a cheer,
A rabbit's delight—code magic is here!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nicolasnoble
Copy link
Member

Bit busy this week, I'll look at the various PRs over the week-end.

@galaxyhaxz
Copy link
Contributor

galaxyhaxz commented Mar 20, 2025

This is technically the proper way to handle these and what GCC would normally generate. However, for in-progress game decompilations, tools like splat expect .sbss/.bss, whereas you must substitute this for .scommon/COMMON instead. I only ran into one problem with this change, it no longer keeps the bss variables in the correct order, in fact, uses the same order regardless if the -s option is used. Might be good to have this as a command line option, just so the existing functionality can stay to force variables into the correct order and sections. (That is, with this change, regardless of what order a variable appears in COMM, it seems to link in a completely different order, whereas when they are stored in BSS, they link in the same order they appear in the ELF).

@Kneesnap
Copy link

Kneesnap commented Mar 20, 2025

Might be good to have this as a command line option, just so the existing functionality can stay to force variables into the correct order and sections

There were 2 other branches linked in the first post which made it available behind a command-line flag. One of them should probably be merged instead of the branch assigned to the PR, so the change can still be opted out of.

That is, with this change, regardless of what order a variable appears in COMM, it seems to link in a completely different order, whereas when they are stored in BSS, they link in the same order they appear in the ELF)

This is intended. A translation unit using common symbols do not have the ability to specify the ordering of common symbols by design. Instead, it's the linker's responsibility to order them. This behavior is consistent with the original ASPSX.EXE + PSYLINK.EXE behavior.

That may sound like you're losing functionality, but it's actually the opposite: this gives more control now. But the responsibility for using this properly is now on each decomp project instead of psyq-obj-parser. The advantage to this is that symbols from the same object don't need to be grouped together anymore, they can be freely moved around, even amongst the common symbols of other objects.

This can be done by creating a single .S file which defines all common symbols from all translation units, in whichever order you'd like. (Example)

@galaxyhaxz
Copy link
Contributor

There were 2 other branches linked in the first post which made it available behind a command-line flag.

Yeah I saw those, that's why I made the suggestion it would be better to go with one of those. Once a given project is finally in a "shiftable" state, then symbol order isn't really a problem. But in the long interim of reversing a game, I find it super helpful to be able to finish a file, have all functions/variables in it and moved out of the asm stuff, and then compile that file to an OBJ, convert to elf, and produce a bin-exact executable. Of course, once this has been done for every file and they can finally be linked with psylink rather than converting to elfs, they will likely scramble again as you mention, the ordering is done by hash, so it would be extremely hard to get them right short of just guessing/appending junk onto names to get the hash in the right order.

@dezgeg
Copy link
Author

dezgeg commented Mar 20, 2025

Thanks for testing!

I only ran into one problem with this change, it no longer keeps the bss variables in the correct order, in fact, uses the same order regardless if the -s option is used.

Can you upload an object file where this happens somewhere, maybe I accidentally broke something unrelated.

@galaxyhaxz
Copy link
Contributor

You technically didn't break anything, as @Kneesnap already mentioned, this is the correct/expected behavior of communal definitions. Multiple files could technically reference the same definition, but the linker has the freedom to move them around and place them how it wants. This is how an obj is properly converted, but has the downside that when decompiling, you often dont have the original variable names, and since COMMs appear to be sorted by hash, this makes it impossible to get them in the right order. Whereas, when they sit within the bss sections, it simply "copies" the bss section out of the elf as a solid block, so everything stays in the same order even if the names are incorrect.

Convert .comm symbols into the proper ELF equivalent, instead of
converting them to local symbol in the .bss segment.

This makes it possible use the converted PSY-Q libraries to recreate a
bit-exact executable of Frogger - without proper comm symbol support
controlling the order of some .bss symbols in the libraries is
impossible.

Even though this is more correct than before there are some known users
of psyq-obj-parser that are relying on the old behaviour thus a command
line option is added which can be used to restore the previous
behaviour.
@dezgeg dezgeg force-pushed the psyq-obj-parser-comm-support-variant0 branch from e7c7796 to 36f2288 Compare March 20, 2025 21:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tools/psyq-obj-parser/psyq-obj-parser.cc (1)

248-508: Consider breaking down the large parse method in a future refactor.

The parse method is becoming increasingly complex (now at 64 cyclomatic complexity, threshold 9) as indicated by the static analysis tools. While the current changes are necessary and well-implemented, consider refactoring this method in the future into smaller, more focused functions to improve maintainability.

🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)

[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7c7796 and 36f2288.

📒 Files selected for processing (1)
  • tools/psyq-obj-parser/psyq-obj-parser.cc (10 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)
tools/psyq-obj-parser/psyq-obj-parser.cc

[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


[warning] 980-982: ❌ Getting worse: Complex Method
PsyqLnkFile::Symbol::generateElfSymbol increases in cyclomatic complexity from 9 to 10, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


[warning] 1399-1400: ❌ Getting worse: Complex Method
PsyqLnkFile::Relocation::generateElf increases in cyclomatic complexity from 62 to 63, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (9)
tools/psyq-obj-parser/psyq-obj-parser.cc (9)

127-129: Updated API signature adds flag to control COMM symbol handling.

The parse method now accepts an additional boolean parameter convertCommToBss that controls whether COMM symbols should be converted to BSS symbols, preserving backward compatibility while adding new functionality.


168-168: Added new symbol type for COMM symbols.

The new COMM enum value in the Symbol::Type enumeration allows the proper representation of common symbols, which is essential for correct ELF generation.


177-177: Updated offset calculation to handle COMM symbols.

The condition now properly handles both UNINITIALIZED and COMM symbol types for offset calculations, ensuring consistent behavior.


297-297: Excluded COMM symbols from BSS placement calculation.

This change ensures COMM symbols are not processed in the BSS symbol placement loop, which is correct since they should be handled differently than regular BSS symbols.


504-508: Core implementation for COMM symbol handling.

The code now conditionally assigns the symbol type based on the convertCommToBss flag, allowing users to choose between the old behavior (treating common symbols as BSS) and the new behavior (preserving them as COMM symbols).

🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)

[warning] 248-508: ❌ Getting worse: Complex Method
PsyqLnkFile::parse increases in cyclomatic complexity from 62 to 64, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


980-982: Set correct ELF section index for COMM symbols.

COMM symbols are now properly assigned to the ELFIO::SHN_COMMON section, which is the standard ELF representation for common symbols.

🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)

[warning] 980-982: ❌ Getting worse: Complex Method
PsyqLnkFile::Symbol::generateElfSymbol increases in cyclomatic complexity from 9 to 10, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


1399-1400: Skip section-specific processing for COMM symbols.

The relocation generation now properly excludes COMM symbols from section-specific processing, as these symbols don't belong to a specific section.

🧰 Tools
🪛 GitHub Check: CodeScene Cloud Delta Analysis (main)

[warning] 1399-1400: ❌ Getting worse: Complex Method
PsyqLnkFile::Relocation::generateElf increases in cyclomatic complexity from 62 to 63, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.


1549-1549: Updated help text with new command line option.

The help text now includes information about the -c option, which allows users to control the conversion of COMM symbols to BSS symbols.


1565-1565: Connected command line flag to internal code.

The command line option -c is now properly passed to the parse method, enabling users to control the COMM symbol handling behavior.

@nicolasnoble
Copy link
Member

Sorry, got hit hard by covid. I'm processing my backlog slowly now.

@dezgeg
Copy link
Author

dezgeg commented Apr 3, 2025

No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants