refactor: multiple changes in `macho` module #76

plusvic · 2024-01-31T10:00:14Z

This is a large refactoring of the macho with the intention of simplifying the existing code and fix some issues. The more important changes are:

all swap_xxxxx functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain.
some portions of the parsing code is simplified by making use of more advanced features of the nom crate.
the cmd and cmdsize fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. The cmd field for example, has always the same value (1) in Segment structures, because command segments are always defined by the same type of command.
rpath is now a list of string instead of a list of structures.
magic numbers look exactly as they appear in the original file, if the magic is CA FE BA BE, it will translated into a 0xcafebabe value.
section and segment names are not forced to be UTF-8. Some files may have section or segment names that contain invalid UTF-8.
increased test coverage

This is a large refactoring of the `macho` with the intention of simplifying the existing code and fix some issues. The more important changes are: * all `swap_xxxxx` functions are removed. Instead of swapping integers after being read from the file, endianness of the file is taken into account while reading individual integers. This is more efficient and easier to maintain. * some portions of the parsing code is simplified by making use of more advanced features of the `nom` crate. * the `cmd` and `cmdsize` fields were removed from the some structures. These fields are not very useful, and they don't appear in the original implementation in YARA. The `cmd` field for example, has always the same value (1) in `Segment` structures, because command segments are always defined by the same type of command. * `rpath` is now a list of string instead of a list of structures. * magic numbers look exactly as they appear in the original file, if the magic is `CA FE BA BE`, it will translated into a `0xcafebabe` value.

…e file Also fix an issue in `m68k_thread_state` and add more test cases.

latonis

just a few questions for clarity, everything else looks good to me 😸

the parser really simplified it down and it looks great, excited to implement more features for Mach-O this way!!

latonis · 2024-01-31T14:47:53Z

Cargo.toml

@@ -97,6 +96,6 @@ yara-x-proto-yaml = { path = "yara-x-proto-yaml" }


 [profile.release]
-# debug = 1   # Include debug information in the binary.
+debug = 1   # Include debug information in the binary.


just wanting to confirm this is intentional for the release profile?

That was unintentional, I'm reverting that change.

latonis · 2024-01-31T16:09:01Z

yara-x/src/modules/macho/tests/testdata/tiny_macho.out

-  - cmd: 1
-    cmdsize: 56
-    segname: ""
+  - segname: "SP1\300\211\347j\010Wj\001P\260\004\353\260"


checked to make sure other tools parsed this as such, they do 👍

I used xmachoviewer to test

latonis · 2024-01-31T16:12:11Z

yara-x/src/modules/macho/parser.rs

+}
+
+/// Parser that reads a 32-bits or 64-bits
+fn uint(


we do have some fields in certain mach-o structs that will be u8 sizes, do we want to account for that here?

example: https://opensource.apple.com/source/xnu/xnu-4570.61.1/osfmk/kern/cs_blobs.h.auto.html

typedef struct __CodeDirectory { uint32_t magic; /* magic number (CSMAGIC_CODEDIRECTORY) */ uint32_t length; /* total length of CodeDirectory blob */ uint32_t version; /* compatibility version */ uint32_t flags; /* setup and mode flags */ uint32_t hashOffset; /* offset of hash slot element at index zero */ uint32_t identOffset; /* offset of identifier string */ uint32_t nSpecialSlots; /* number of special hash slots */ uint32_t nCodeSlots; /* number of ordinary (code) hash slots */ uint32_t codeLimit; /* limit to main image signature range */ uint8_t hashSize; /* size of each hash in bytes */ uint8_t hashType; /* type of hash (cdHashType* constants) */ uint8_t platform; /* platform identifier; zero if not platform binary */ uint8_t pageSize; /* log2(page size in bytes); 0 => infinite */ uint32_t spare2; /* unused (must be zero) */ }

This uint parser is intended to be used in cases where the integer's endianness, or wideness (or both) is not known at compile time. u8 doesn't have those problems as they are not affected by endianness and the size if already known.

ah ok great, sounds good to me. We do have some instances where endianness is always big for parsing certain mach-o structs as well, but that is also easily handled already with nom. thanks for the sanity check!

TommYDeeee · 2024-02-02T09:17:37Z

Thank you for this refactor, even though it is already merged. It looks really good and those advanced num validation checks 👍

plusvic added 2 commits January 31, 2024 10:59

style: fix clippy warnings

81ca16f

plusvic mentioned this pull request Jan 31, 2024

feat: parse certificates, entitlements, symbol table for Mach-O #73

Closed

fix: reserved3 field is only filled in 64-bits binaries.

2d35d94

TommYDeeee mentioned this pull request Jan 31, 2024

fix: add support for FAT Mach-O files in yr dump auto module selection #77

Closed

plusvic added 9 commits January 31, 2024 12:12

fix: abort parsing when the end of the file is reached.

b6631c8

fix: broken test case

a225e4e

chore: remove unnecessary structure Dylinker

c4867e9

chore: dump cputype and cpusubtype as hex numbers.

936e189

fix: always read integers using the endianness that corresponds to th…

d74e7db

…e file Also fix an issue in `m68k_thread_state` and add more test cases.

chore: remove logging from the list of default features

abc234c

tests: fix broken test case

032fca4

style: minor style change and fix clippy warning

de2d9a9

tests: add missing import statements

3a6045c

latonis approved these changes Jan 31, 2024

View reviewed changes

chore: don't include debug information in release binaries

2a2ad8e

plusvic merged commit 79fd9d2 into main Jan 31, 2024
22 checks passed

plusvic deleted the macho-refactor branch February 1, 2024 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: multiple changes in `macho` module #76

refactor: multiple changes in `macho` module #76

plusvic commented Jan 31, 2024 •

edited

Loading

latonis left a comment

latonis Jan 31, 2024

plusvic Jan 31, 2024

latonis Jan 31, 2024

latonis Jan 31, 2024

plusvic Jan 31, 2024

latonis Jan 31, 2024

TommYDeeee commented Feb 2, 2024

refactor: multiple changes in macho module #76

refactor: multiple changes in macho module #76

Conversation

plusvic commented Jan 31, 2024 • edited Loading

latonis left a comment

Choose a reason for hiding this comment

latonis Jan 31, 2024

Choose a reason for hiding this comment

plusvic Jan 31, 2024

Choose a reason for hiding this comment

latonis Jan 31, 2024

Choose a reason for hiding this comment

latonis Jan 31, 2024

Choose a reason for hiding this comment

plusvic Jan 31, 2024

Choose a reason for hiding this comment

latonis Jan 31, 2024

Choose a reason for hiding this comment

TommYDeeee commented Feb 2, 2024

refactor: multiple changes in `macho` module #76

refactor: multiple changes in `macho` module #76

plusvic commented Jan 31, 2024 •

edited

Loading