-
Notifications
You must be signed in to change notification settings - Fork 90
Report carve dir #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report carve dir #1017
Conversation
qkaiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I would maybe add some tests where we set the ExtractionConfig to use non-standard carve_suffix and extract_suffix ? See if we can spot abnormal behavior that would appear out of assumption made on the default suffix values.
e89ad23 to
2767e86
Compare
|
@e3krisztian cool ! I'll wait for the tests to land before I do a final review round. |
|
@e3krisztian two small comments. I think we can rebase and merge when it's cleared. |
Current devenv config has problems on at least Ubuntu 22.04.
.envrc.user with the below content enables work without devenv:
layout_poetry() {
# create venv if it doesn't exist
poetry run true
export VIRTUAL_ENV=$(poetry env info --path)
export POETRY_ACTIVE=1
PATH_add "$VIRTUAL_ENV/bin"
}
layout_poetry
export SKIP=nixpkgs-fmt
export UNBLOB_USE_DEVENV=false
This removes the burden of carving from already complex function _extract_chunks and also allowed for some better variable names.
Carve directories were hard to explain, as they look like extraction directories and there was no public information to tell them apart. Adding this report makes the purpose of the directory visible.
`_FileTask.carve_dir` was initially used for both extraction and carving. The naming of the directories can now differ, so it is not used anymore apart from an existence check, which would terminate this branch of the extraction. This output directory existence check is now present in both the carving and extraction paths, and the output report's name is also renamed, to accommodate both types of output directories. `ExtractDirectoryExistsReport` was generalized to `OutputDirectoryExistsReport` instead of introducing yet another `Report` type - `CarveDirectoryExistsReport`.
Chunk statistics require a divide by total chunk size, which can be 0
in certain rare cases. This makes chunk related output is conditional,
and not part of the summary.
An example command line sequence which leads to a silent failure:
(echo a; gzip < README.md ; echo b) > fw
unblob fw
# the next command would silently fail:
unblob fw
With the separation of carve and extract directories, the output directory become dependent on the *content* of the input file: if it has multiple chunks, because it is not covered by a single handler the output directory will be generated as a *carve* directory, otherwise as an *extract* directory.
The output path is printed in the previous commit, so depending on the caller having to look at well known paths is no longer needed.
The test files were created with this script:
# cd tests/files/suffixes
# clean
rm -rf chunks_carve/ extractions/ collisions.zip __input__ __outputs__
# reproduce output
mkdir __input__ __outputs__
seq 100 | gzip > 0-160.gzip
seq 128 | gzip > 160-375.gzip
dd if=/dev/zero of=375-512.padding bs=1 count=137
cat 0-160.gzip 160-375.gzip 375-512.padding > __input__/chunks
unblob --carve-suffix _carve chunks
cp 0-160.gzip chunks_carve/
echo something else > chunks_carve/0-160.gzip_extract/gzip.uncompressed
zip __input__/collisions.zip chunks chunks_carve/0-160.gzip chunks_carve/0-160.gzip_extract/gzip.uncompressed
rm 0-160.gzip 160-375.gzip 375-512.padding
rm -rf chunks_carve
for input in collisions.zip chunks
do
unblob \
-e __outputs__/$input/defaults/ __input__/$input
unblob --carve-suffix _carve \
-e __outputs__/$input/_carve_extract/ __input__/$input
unblob --carve-suffix _c --extract-suffix _e \
-e __outputs__/$input/_c_e/ __input__/$input
done
711e3d2 to
4f1bd85
Compare
|
Rebased and resolved raised problems in tests. |
Reworked #891 to report carve dirs instead of the carved files, as well as support different suffixes for carve and extraction directories.