-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the hashutil functions to properly deal with path record field types #432
Conversation
This should fix #413 |
I made |
Codecov Report
@@ Coverage Diff @@
## main #432 +/- ##
==========================================
+ Coverage 73.99% 74.12% +0.12%
==========================================
Files 258 258
Lines 20559 20622 +63
==========================================
+ Hits 15213 15286 +73
+ Misses 5346 5336 -10
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 39 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
There might be some folks that specifically use that utility, so I'd set a version when it will be deprecated (not the next release, but the one after that) so they have at least ample time to change their usage? |
6b90733
to
22b2c02
Compare
dissect/target/helpers/hashutil.py
Outdated
warnings.warn( | ||
( | ||
"The function passing in a path field did not resolve that path. ", | ||
"The autmatic resolving of path fields will be removed in the future.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The autmatic resolving of path fields will be removed in the future.", | |
"The automatic resolving of path fields will be removed in the future.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this part of the code should not have any knowledge about what the meaning of the content of a field is. That is the responsibility of the plugin creating the record. The plugin knows whether it is useful to resolve a path or not. It's correct separation of responsibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when for preservation of the original data I want to put the original path from e.g. the registry in the record which might include an environment variable, but want to hash that file? Should I create a second field like real_path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. You do want to preserve original values. Though putting unresolved paths with e.g. env variables feels problematic. In my mind a (pathlib style) path type means a literal path, where percentages, dollars and all are part of the actual path, not of potential env variables, otherwise you would still be guessing if it is an actual path or if it still needs to be resolved.
In an ideal world I think you would have 2 values, a literal string which contains whatever was parsed from the artifact and a path which is resolved and present on the target. You would still have the side cases of: 1. what if the path correctly resolves but is not present in the target and 2. what if the path does not or partially resolve how can you be sure the dollar/percentage is part of the path or an unresolved env variable.
Although the last side case won't be solvable in practice, I do think we need to move to a system where a path is as fully resolved as possible and if the literal value, is of value it should be present in the record as a second (string) value. Also because people tend to want to put paths that come out of target-query into something like target-fs to get to the file, which will fail if it is not resolved (we could of course change target-fs to (have the option to) resolve paths).
But given that this is more a policy discussion then something that should be realized by this PR, I'll revert the deprecation warning on unresolved paths for now.
2179a0d
to
8ccfd8d
Compare
b8556d3
to
3717028
Compare
3717028
to
c181dbe
Compare
(DIS-2526)