You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using aff.py --extract-all. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.
This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.
Suspected diagnosis
Every member of a zip, whether a file or directory, appears to be assigned the type aff4:FileImage per the --meta dump from the .aff4 file. I'm guessing in-zip directories should instead be aff4:FolderImage, as this query is being used to feed a loop:
for imageUrn in resolver.QueryPredicateObject(volume.urn, lexicon.AFF4_TYPE, lexicon.standard11.FileImage)
And in that loop, every FileImage is being created/treated as regular file. A directory thrown in the mix raises a IsADirectoryError.
However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after fn is defined in that function), and propagated to causing a aff4:FolderImage. The ZipInfo class in that file?
Steps to reproduce
The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.
Extract everything from the flat aff4 archive. Currently works.
Pull Request 14 fixes an unrelated issue with the way extractAll is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.
#!/bin/bash
# step3.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_flat
mkdir extraction_flat
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_flat \
flat.aff4 \
extraction_flat
Extract everything from the aff4 archive. Currently fails.
PR 14 should be integrated in order to see step3.sh below fail in the illustrative way.
#!/bin/bash
# step4.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_deep
mkdir extraction_deep
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_deep \
deep.aff4 \
extraction_deep
Traceback of step4.sh:
Traceback (most recent call last):
File "../deps/pyaff4/aff4.py", line 421, in <module>
main(sys.argv)
File "../deps/pyaff4/aff4.py", line 414, in main
extractAll(dest, args.folder)
File "../deps/pyaff4/aff4.py", line 312, in extractAll
with open(destFile, "wb") as destStream:
IsADirectoryError: [Errno 21] Is a directory: 'extraction_deep/deep.zip/input_dir_1'
Resolution confirmation
When step4.sh above creates this file hierarchy, this Issue's good to close.
* Added /build/ directory to .gitignore
* Fixed bug initialising aff4.LogicalImage instances
* fixed bugs in extract()
* fixed aff4.extractAll() NameError exeption
* Fix call to extractAll function
This patch is partially necessary to correct the `--extract-all` flag.
With this correction, `--extract-all` will work if there were no
directories ingested from an input zip.
Issue 15 reports on the problem with directories from an ingested zip.
#15
This patch builds on Pull Request 13, as I'd also found the binary-
output mode was necessary, though in a different spot.
#13
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using
aff.py --extract-all
. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.
Suspected diagnosis
Every member of a zip, whether a file or directory, appears to be assigned the type
aff4:FileImage
per the--meta
dump from the.aff4
file. I'm guessing in-zip directories should instead beaff4:FolderImage
, as this query is being used to feed a loop:And in that loop, every
FileImage
is being created/treated as regular file. A directory thrown in the mix raises aIsADirectoryError
.Suspected correction
In the function
BasicZipFile.parse_cd
, somewhere before the info message on line 694, a check needs to be made for the file being a directory. The since-Python-3.6 method of checking for the last character of the name being "/
" should do.However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after
fn
is defined in that function), and propagated to causing aaff4:FolderImage
. TheZipInfo
class in that file?Steps to reproduce
The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.
Pull Request 14 fixes an unrelated issue with the way
extractAll
is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.PR 14 should be integrated in order to see
step3.sh
below fail in the illustrative way.Traceback of
step4.sh
:Resolution confirmation
When
step4.sh
above creates this file hierarchy, this Issue's good to close.The text was updated successfully, but these errors were encountered: