Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileType guesstimating needs refactoring #257

Closed
jtmoon79 opened this issue Mar 22, 2024 · 1 comment
Closed

FileType guesstimating needs refactoring #257

jtmoon79 opened this issue Mar 22, 2024 · 1 comment
Labels
code improvement enhancement not seen by the user difficult A difficult problem; a major coding effort or difficult algorithm to perfect P1 important

Comments

@jtmoon79
Copy link
Owner

jtmoon79 commented Mar 22, 2024

Summary

file type estimating (guessing) is kind of messy.

Current behavior

  1. the use of Mimetype adds nearly zero benefit for a lot of code
  2. the resultant MimeGuess and FileType is confusing; which one matters when?
  3. FileType guessing is hacky name matching

Suggested behavior

  1. remove Mimetype and MimeGuess entirely (affects BlockReader should receive MimeGuess #15)
  2. more robust and systematic approach to determining FileType based on the file name

If 1. and 2. are completed then a new Issue should be created around allowing the filepreprocessor.rs to read the zero block of the file and do some kind of magic fingerprint matching as well.
That change leads to another very large change wherein multiple FileTypes may be returned during file preprocessing, where the appropriate Reader is attempted and if it fails then the next Reader is attempted.

@jtmoon79 jtmoon79 changed the title file type estimating needs refactoring FileType guesstimating needs refactoring Mar 22, 2024
@jtmoon79 jtmoon79 added code improvement enhancement not seen by the user P1 important labels Mar 22, 2024
@jtmoon79 jtmoon79 added the difficult A difficult problem; a major coding effort or difficult algorithm to perfect label Apr 16, 2024
@jtmoon79
Copy link
Owner Author

jtmoon79 commented Apr 25, 2024

new Issue should be created around allowing the filepreprocessor.rs to read the zero block of the file and do some kind of magic fingerprint matching as well.

For Issue #16

jtmoon79 added a commit that referenced this issue Apr 30, 2024
refactor `enum FileType` to embed archive and storage information in
field variant `archival_type`
Add variant `encoding_type` for `FileType::Text`

refactor `pathbuf_to_filetype` to be more straightforward and recursive

entirely remove `Mimeguess`
Issue #15 (completed)

This part 1 of completing the following issues:
Issue #257
Issue #285
jtmoon79 added a commit that referenced this issue Apr 30, 2024
refactor `enum FileType` to embed archive and storage information in
field variant `archival_type`
Add variant `encoding_type` for `FileType::Text`

refactor `pathbuf_to_filetype` to be more straightforward and recursive

entirely remove `Mimeguess`
Issue #15 (completed)

This part 1 of completing the following issues:
Issue #257
Issue #285
jtmoon79 added a commit that referenced this issue Apr 30, 2024
Refactor `path_to_filetype` to allow filetype_archive (gz, xz)
for parseable files EVTX, FixedStruct, journal.
Allow compressed `.tar` files.
Only allows a "single level" of archival type.

None of these are handled yet.

This is part 2 of:
Issue #257
Issue #285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code improvement enhancement not seen by the user difficult A difficult problem; a major coding effort or difficult algorithm to perfect P1 important
Projects
None yet
Development

No branches or pull requests

1 participant