Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: multiple changes in macho module #76

Merged
merged 13 commits into from
Jan 31, 2024
Merged

refactor: multiple changes in macho module #76

merged 13 commits into from
Jan 31, 2024

Conversation

plusvic
Copy link
Member

@plusvic plusvic commented Jan 31, 2024

This is a large refactoring of the macho with the intention of simplifying the existing code and fix some issues. The more important changes are:

  • all swap_xxxxx functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain.

  • some portions of the parsing code is simplified by making use of more advanced features of the nom crate.

  • the cmd and cmdsize fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. The cmd field for example, has always the same value (1) in Segment structures, because command segments are always defined by the same type of command.

  • rpath is now a list of string instead of a list of structures.

  • magic numbers look exactly as they appear in the original file, if the magic is CA FE BA BE, it will translated into a 0xcafebabe value.

  • section and segment names are not forced to be UTF-8. Some files may have section or segment names that contain invalid UTF-8.

  • increased test coverage

This is a large refactoring of the `macho` with the intention of simplifying the existing code and fix some issues. The more important changes are:

* all `swap_xxxxx` functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain.

* some portions of the parsing code is simplified by making use of more advanced features of the `nom` crate.

* the `cmd` and `cmdsize` fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. The `cmd` field for example, has always the same value (1) in `Segment` structures, because command segments are always defined by the same type of command.

* `rpath` is now a list of string instead of a list of structures.

* magic numbers look exactly as they appear in the original file, if the magic is `CA FE BA BE`, it will translated into a `0xcafebabe` value.
Copy link
Contributor

@latonis latonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few questions for clarity, everything else looks good to me 😸

the parser really simplified it down and it looks great, excited to implement more features for Mach-O this way!!

Cargo.toml Outdated
@@ -97,6 +96,6 @@ yara-x-proto-yaml = { path = "yara-x-proto-yaml" }


[profile.release]
# debug = 1 # Include debug information in the binary.
debug = 1 # Include debug information in the binary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wanting to confirm this is intentional for the release profile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was unintentional, I'm reverting that change.

- cmd: 1
cmdsize: 56
segname: ""
- segname: "SP1\300\211\347j\010Wj\001P\260\004\353\260"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked to make sure other tools parsed this as such, they do 👍

I used xmachoviewer to test

}

/// Parser that reads a 32-bits or 64-bits
fn uint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have some fields in certain mach-o structs that will be u8 sizes, do we want to account for that here?

example: https://opensource.apple.com/source/xnu/xnu-4570.61.1/osfmk/kern/cs_blobs.h.auto.html

typedef struct __CodeDirectory {
	uint32_t magic;					/* magic number (CSMAGIC_CODEDIRECTORY) */
	uint32_t length;				/* total length of CodeDirectory blob */
	uint32_t version;				/* compatibility version */
	uint32_t flags;					/* setup and mode flags */
	uint32_t hashOffset;			/* offset of hash slot element at index zero */
	uint32_t identOffset;			/* offset of identifier string */
	uint32_t nSpecialSlots;			/* number of special hash slots */
	uint32_t nCodeSlots;			/* number of ordinary (code) hash slots */
	uint32_t codeLimit;				/* limit to main image signature range */
	uint8_t hashSize;				/* size of each hash in bytes */
	uint8_t hashType;				/* type of hash (cdHashType* constants) */
	uint8_t platform;				/* platform identifier; zero if not platform binary */
	uint8_t	pageSize;				/* log2(page size in bytes); 0 => infinite */
	uint32_t spare2;				/* unused (must be zero) */
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uint parser is intended to be used in cases where the integer's endianness, or wideness (or both) is not known at compile time. u8 doesn't have those problems as they are not affected by endianness and the size if already known.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok great, sounds good to me. We do have some instances where endianness is always big for parsing certain mach-o structs as well, but that is also easily handled already with nom. thanks for the sanity check!

@plusvic plusvic merged commit 79fd9d2 into main Jan 31, 2024
22 checks passed
@plusvic plusvic deleted the macho-refactor branch February 1, 2024 10:42
@TommYDeeee
Copy link
Contributor

Thank you for this refactor, even though it is already merged. It looks really good and those advanced num validation checks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants