Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report carve dir #1017

Merged
merged 12 commits into from
Dec 3, 2024
Merged

Report carve dir #1017

merged 12 commits into from
Dec 3, 2024

Conversation

e3krisztian
Copy link
Contributor

Reworked #891 to report carve dirs instead of the carved files, as well as support different suffixes for carve and extraction directories.

@e3krisztian e3krisztian mentioned this pull request Nov 26, 2024
Copy link
Contributor

@qkaiser qkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I would maybe add some tests where we set the ExtractionConfig to use non-standard carve_suffix and extract_suffix ? See if we can spot abnormal behavior that would appear out of assumption made on the default suffix values.

unblob/processing.py Outdated Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/cli.py Show resolved Hide resolved
unblob/processing.py Show resolved Hide resolved
unblob/processing.py Outdated Show resolved Hide resolved
unblob/cli.py Outdated Show resolved Hide resolved
unblob/cli.py Outdated Show resolved Hide resolved
@qkaiser
Copy link
Contributor

qkaiser commented Nov 27, 2024

Commits f2cb06e and 8024aed should be next to each other in the history I think. Easier to follow.

@e3krisztian e3krisztian force-pushed the report_carve_dir branch 3 times, most recently from e89ad23 to 2767e86 Compare November 27, 2024 16:51
@qkaiser
Copy link
Contributor

qkaiser commented Nov 27, 2024

@e3krisztian cool ! I'll wait for the tests to land before I do a final review round.

tests/test_processing_suffixes.py Outdated Show resolved Hide resolved
tests/test_processing_suffixes.py Outdated Show resolved Hide resolved
@qkaiser
Copy link
Contributor

qkaiser commented Nov 29, 2024

@e3krisztian two small comments. I think we can rebase and merge when it's cleared.

@qkaiser qkaiser added enhancement New feature or request python Pull requests that update Python code labels Nov 29, 2024
e3krisztian and others added 12 commits December 3, 2024 12:09
Current devenv config has problems on at least Ubuntu 22.04.

.envrc.user with the below content enables work without devenv:

layout_poetry() {
 # create venv if it doesn't exist
 poetry run true

 export VIRTUAL_ENV=$(poetry env info --path)
 export POETRY_ACTIVE=1
 PATH_add "$VIRTUAL_ENV/bin"
}

layout_poetry
export SKIP=nixpkgs-fmt
export UNBLOB_USE_DEVENV=false
This removes the burden of carving from already complex function
_extract_chunks and also allowed for some better variable names.
Carve directories were hard to explain, as they look like extraction
directories and there was no public information to tell them apart.

Adding this report makes the purpose of the directory visible.
`_FileTask.carve_dir` was initially used for both extraction and carving.
The naming of the directories can now differ, so it is not used anymore
apart from an existence check, which would terminate this branch of the
extraction. This output directory existence check is now present in both
the carving and extraction paths, and the output report's name is also
renamed, to accommodate both types of output directories.

`ExtractDirectoryExistsReport` was generalized to
`OutputDirectoryExistsReport` instead of introducing yet another
`Report` type - `CarveDirectoryExistsReport`.
Chunk statistics require a divide by total chunk size, which can be 0
in certain rare cases. This makes chunk related output is conditional,
and not part of the summary.

An example command line sequence which leads to a silent failure:

    (echo a; gzip < README.md ; echo b) > fw
    unblob fw
    # the next command would silently fail:
    unblob fw
With the separation of carve and extract directories, the output
directory become dependent on the *content* of the input file:
if it has multiple chunks, because it is not covered by a single handler
the output directory will be generated as a *carve* directory,
otherwise as an *extract* directory.
The output path is printed in the previous commit, so depending on the
caller having to look at well known paths is no longer needed.
The test files were created with this script:

    # cd tests/files/suffixes

    # clean
    rm -rf chunks_carve/ extractions/ collisions.zip __input__ __outputs__

    # reproduce output
    mkdir __input__ __outputs__
    seq 100 | gzip > 0-160.gzip
    seq 128 | gzip > 160-375.gzip
    dd if=/dev/zero of=375-512.padding bs=1 count=137
    cat 0-160.gzip 160-375.gzip 375-512.padding > __input__/chunks

    unblob --carve-suffix _carve chunks
    cp 0-160.gzip chunks_carve/
    echo something else > chunks_carve/0-160.gzip_extract/gzip.uncompressed

    zip __input__/collisions.zip chunks chunks_carve/0-160.gzip chunks_carve/0-160.gzip_extract/gzip.uncompressed

    rm 0-160.gzip 160-375.gzip 375-512.padding
    rm -rf chunks_carve

    for input in collisions.zip chunks
    do
      unblob \
            -e __outputs__/$input/defaults/ __input__/$input
      unblob --carve-suffix _carve \
            -e __outputs__/$input/_carve_extract/ __input__/$input
      unblob --carve-suffix _c --extract-suffix _e \
            -e __outputs__/$input/_c_e/ __input__/$input
    done
@e3krisztian
Copy link
Contributor Author

Rebased and resolved raised problems in tests.

@e3krisztian e3krisztian merged commit 7565a38 into main Dec 3, 2024
15 checks passed
@e3krisztian e3krisztian deleted the report_carve_dir branch December 3, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants