-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: multiple changes in macho
module
#76
Conversation
This is a large refactoring of the `macho` with the intention of simplifying the existing code and fix some issues. The more important changes are: * all `swap_xxxxx` functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain. * some portions of the parsing code is simplified by making use of more advanced features of the `nom` crate. * the `cmd` and `cmdsize` fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. The `cmd` field for example, has always the same value (1) in `Segment` structures, because command segments are always defined by the same type of command. * `rpath` is now a list of string instead of a list of structures. * magic numbers look exactly as they appear in the original file, if the magic is `CA FE BA BE`, it will translated into a `0xcafebabe` value.
…e file Also fix an issue in `m68k_thread_state` and add more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few questions for clarity, everything else looks good to me 😸
the parser really simplified it down and it looks great, excited to implement more features for Mach-O this way!!
Cargo.toml
Outdated
@@ -97,6 +96,6 @@ yara-x-proto-yaml = { path = "yara-x-proto-yaml" } | |||
|
|||
|
|||
[profile.release] | |||
# debug = 1 # Include debug information in the binary. | |||
debug = 1 # Include debug information in the binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanting to confirm this is intentional for the release profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was unintentional, I'm reverting that change.
- cmd: 1 | ||
cmdsize: 56 | ||
segname: "" | ||
- segname: "SP1\300\211\347j\010Wj\001P\260\004\353\260" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked to make sure other tools parsed this as such, they do 👍
I used xmachoviewer to test
} | ||
|
||
/// Parser that reads a 32-bits or 64-bits | ||
fn uint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have some fields in certain mach-o structs that will be u8 sizes, do we want to account for that here?
example: https://opensource.apple.com/source/xnu/xnu-4570.61.1/osfmk/kern/cs_blobs.h.auto.html
typedef struct __CodeDirectory {
uint32_t magic; /* magic number (CSMAGIC_CODEDIRECTORY) */
uint32_t length; /* total length of CodeDirectory blob */
uint32_t version; /* compatibility version */
uint32_t flags; /* setup and mode flags */
uint32_t hashOffset; /* offset of hash slot element at index zero */
uint32_t identOffset; /* offset of identifier string */
uint32_t nSpecialSlots; /* number of special hash slots */
uint32_t nCodeSlots; /* number of ordinary (code) hash slots */
uint32_t codeLimit; /* limit to main image signature range */
uint8_t hashSize; /* size of each hash in bytes */
uint8_t hashType; /* type of hash (cdHashType* constants) */
uint8_t platform; /* platform identifier; zero if not platform binary */
uint8_t pageSize; /* log2(page size in bytes); 0 => infinite */
uint32_t spare2; /* unused (must be zero) */
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uint
parser is intended to be used in cases where the integer's endianness, or wideness (or both) is not known at compile time. u8
doesn't have those problems as they are not affected by endianness and the size if already known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok great, sounds good to me. We do have some instances where endianness is always big for parsing certain mach-o structs as well, but that is also easily handled already with nom. thanks for the sanity check!
Thank you for this refactor, even though it is already merged. It looks really good and those advanced num validation checks 👍 |
This is a large refactoring of the
macho
with the intention of simplifying the existing code and fix some issues. The more important changes are:all
swap_xxxxx
functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain.some portions of the parsing code is simplified by making use of more advanced features of the
nom
crate.the
cmd
andcmdsize
fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. Thecmd
field for example, has always the same value (1) inSegment
structures, because command segments are always defined by the same type of command.rpath
is now a list of string instead of a list of structures.magic numbers look exactly as they appear in the original file, if the magic is
CA FE BA BE
, it will translated into a0xcafebabe
value.section and segment names are not forced to be UTF-8. Some files may have section or segment names that contain invalid UTF-8.
increased test coverage