Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE: parse rich header and refactor DOS stub parser #406

Merged
merged 38 commits into from
Oct 28, 2024
Merged

Conversation

kkent030315
Copy link
Contributor

@kkent030315 kkent030315 commented Apr 13, 2024

This PR adds parsing of Rich headers, as someone opened issue #400.

  • Added rich header goblin::pe::header::RichHeader goblin::pe::header::RichMetadata parsing if present, with success/fail/corrupted tests.
  • DOS stub is now non-fixed size of slice reference, and explicitly isolated from the DOS header.
    • The DOS stub is not always guaranteed to be 64-bytes long; there are some linkers (Borland C++) and PE packers generate application-specific DOS stub; (dos stub length restricts to the first 64 bytes #422) and
    • The DOS stub includes the DOS header in definition. The DOS header and DOS stub are sometimes known to be separate, but also sometimes together. This is confusing stuff, and there are no official statement. By design, separating the header and the stub would be preferred. Concating them in the higher implementations (on the user side) are pretty easiest than making it together (otherwise user needs to mess with the raw buffer in the higher implementations.. bad).

I took the constant bytes in the test code from mthiesen/link-patcher (MIT). If this can potentially be license incompliance, I am happy to make own specimens for testing.

@kkent030315
Copy link
Contributor Author

I'm a bit tired, will fix the CI at the rest of today or tomorrow.

@kkent030315 kkent030315 marked this pull request as draft April 13, 2024 19:30
@m4b m4b marked this pull request as ready for review April 15, 2024 04:25
Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately there are some breaking changes here, but some seem unavoidable; primarily I'd like to see DosStub have it's parse method move into an impl DosStub and the &[u8] be reverted back to the DosStub type that was there before. otherwise it's looking ok!

src/pe/header.rs Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/mod.rs Outdated Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Show resolved Hide resolved
@m4b
Copy link
Owner

m4b commented May 20, 2024

@kkent030315 gentle ping; would definitely like to see this go in, but I had requested some changes :)

NOTE: all `cargo test` passed
@kkent030315
Copy link
Contributor Author

@m4b Hey, sorry for being shadow. I had do deal with the my life, you know.. anyways, a little bit changes have been applied to fix the code indeed, and the tests cargo test are passed. Would you like to checking this out?

The current codebase in this PR is what I am used/using as an backend parser helper for my own binary rewriting framework that can partiolize the any PE64 binary into object-by-object (aka struct-by-struct, byte-by-byte etc) and this rich header code works perfectly with the 1000+ unique PE64 binaries over the world (regardless of compiler toolchains, including but not limited to MSVC(link), Clang(LLD), MinGW-GCC, NOTE: Only MSVC linker inserts rich header) as well, so it's quite stable for now I believe, while there are something to deal with the coding style you pointed though.

image

I'm going to deal with another my PRs ASAP. I'd like to contribute more on goblin, as theres too many things to get rid of, and some insufficient features.

@m4b
Copy link
Owner

m4b commented Oct 22, 2024

I'm sorry I knew there was another PR I needed to check before releasing 9.0, I will give this a review, but probably won't be till weekend. Please feel free to ping me then if you haven't gotten any feedback. And thanks for your contributions! :)

@m4b
Copy link
Owner

m4b commented Oct 22, 2024

Also I hope everything in your life is good; don't feel need to push yourself either, your life comes first :)

@kkent030315
Copy link
Contributor Author

@m4b Thanks for taking attention on this! I think theres lots of things to discuess here. But I think breaking changes are inevitable for this PR, perhaps how about introducing this feature on 0.10?

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is the last time, I think we're almost done here; as I mention in another comment, tl;dr I did not realize the heart of the issue is that the DosStub is variable and bounded by pe_pointer_offset; your changes look correct, assuming that statement is correct :) that being said, we should go back to your original intuition, as I mention, and the bytes should not be allocated/owned, but slices, so we have to pass a lifetime param to Header, etc. This is fine, this pattern is common in other parts of goblin, e.g., mach-o has a lot of that.

For bonus points, we can keep Copy, i'm pretty sure, on all the structs, if you do impl Iterator<Item=Result<RichMetadata>> for a method on RichHeader to avoid having a Vec inside of it.

After that, I Think this is ready to go, though do note it is a breaking change, a pretty good one, but I think it's worthwhile.

Oh and you have the honor of deleting the comment about adding RichHeader to the Header field, since you implemented it :)

src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Show resolved Hide resolved
src/pe/header.rs Outdated Show resolved Hide resolved
src/pe/header.rs Show resolved Hide resolved
@kkent030315
Copy link
Contributor Author

@m4b Thank you very much for the review! I think it's almost perfect now. Would you able to take a look into the fixes?

@kkent030315
Copy link
Contributor Author

Maybe something to discuss here #406 (comment):

We can explicitly isolate DOS stub and Rich header datas, with some mroe complex implementations, but it makes no sense for me to do that.

I can definitely do that work if we should.

@kkent030315
Copy link
Contributor Author

Self review is done. Looks perfect for me right now.

@m4b
Copy link
Owner

m4b commented Oct 27, 2024

@kkent030315 apologies I wanted to merge the parse_without_dos patch since it was before this, and was waiting a while; so you'll have to rebase, and add a rich_header parameter to the _impl function and pass a None for parse_without_dos is my guess; in meantime I'll do a final review but this looks ready to go just cursorily glancing at it

I appreciate your patience!

@m4b
Copy link
Owner

m4b commented Oct 27, 2024

So I just tried doing this rebase, and it's very annoying/tedious due to all the individual commits; i suggest squashing your patch down to a single commit first, and then rebasing on master to make it easiest.

@kkent030315
Copy link
Contributor Author

How about now? I resolved all conflicts wtih the upstream and should be able to merge with squash without problems.

@kkent030315
Copy link
Contributor Author

apologies I wanted to merge the parse_without_dos patch since it was before this, and was waiting a while; so you'll have to rebase, and add a rich_header parameter to the _impl function and pass a None for parse_without_dos is my guess; in meantime I'll do a final review but this looks ready to go just cursorily glancing at it

As of current implementation of rich header parser is always individual regardless of DOS stub, since it takes hardcoded offset of end of DOS stub, so rich header parser still works even if DOS stub is default (no rich header) by Header::parse_without_dos, as I added test for this situation:

    #[test]
    fn parse_without_dos() {
        let header = Header::parse_without_dos(&BORLAND_PE32_VALID_NO_RICH_HEADER).unwrap();
        assert_eq!(header.dos_stub, DosStub::default());
        assert_eq!(header.rich_header.is_none(), true);

        // DOS stub is default but rich parser still works
        let header = Header::parse_without_dos(&CORRECT_RICH_HEADER).unwrap();
        assert_eq!(header.dos_stub, DosStub::default());
        assert_eq!(header.rich_header.is_some(), true);
    }

I don't know if we should not parse rich header when Header::parse_without_dos or we should. Do you have any idea?

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get feedback from them whether they want the rich data, otherwise we can leave as is; i have to run outside right now but i'll likely merge this tonight, thanks for all your great work!

@m4b
Copy link
Owner

m4b commented Oct 27, 2024

@ideeockus do you have any opinion on the rich header question?

pub struct DosStub(pub [u8; 0x40]);
impl Default for DosStub {
pub struct DosStub<'a> {
pub data: &'a [u8],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change this to private and have a method bytes() -> &'a [u8] in a separate patch; if we ever want to switch this to something like Cow<'a, [u8]> and have an owned version to allow Header<'a> will be easier

@m4b m4b merged commit f43bc30 into m4b:master Oct 28, 2024
6 checks passed
@m4b
Copy link
Owner

m4b commented Oct 28, 2024

this was an epic PR @kkent030315 thanks for your patience!

@kkent030315 kkent030315 mentioned this pull request Nov 3, 2024
15 tasks
@kkent030315 kkent030315 deleted the richh branch November 18, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants