Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect file type #581

Merged
merged 5 commits into from
Apr 13, 2020
Merged

Detect file type #581

merged 5 commits into from
Apr 13, 2020

Conversation

jameskerr
Copy link
Member

Fixes #573

This function reads the first 4 bytes to see if it matches the pcap signatures, then checks for zeek headers on the first four lines with a regex, then checks for json by reading the first 4 lines and seeing if they are json parsable.

This reads the first bits of the file and checks some heuristics.

This will be used for choosing which endpoint to send the data
to. /packets or /zeek.
@jameskerr jameskerr requested a review from mason-fish April 8, 2020 23:24
}

function isZeekHeader(line) {
return /^#\w+\s+\S+/.test(line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will match a trivially-contrived sourcable sh file, like

#source me to do stuff
export A=b

Instead of ^#\w+\s+\S+, check for ^#separator \S+ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. How does the proposed regex prevent that problem?

I'm also checking the first 4 lines of the file to see if they match that regex. Is it true that all zeek filees start with the "separator" header?

Lastly, is it a security concern if they spoof this with a non-zeek file? All we do with it pass along the file path to zqd. Whatever they do in zqd will probably throw an error. I don't think anything is able to be "sourced".

Let me ask a zqd dev if this needs to be more robust. @henridf will this zeek detector regex be good enough for ingest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that all zeek filees start with the "separator" header?

I think they must since all lines after have to use the separator as defined.

Lastly, is it a security concern if they spoof this with a non-zeek file?

Not necessarily. My $0.02 was, if we are doing file detection we can at least block obviously invalid text files before they go all the way through to zqd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww I see what you mean. If someone accidentally gives us a file with standard # comments at the top of the file (which many files do). Got it. Yeah I'll update that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Still happy to hear anyone else's comments on this issue.

src/js/lib/fileType.test.js Outdated Show resolved Hide resolved
Copy link
Contributor

@mikesbrown mikesbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments.

@jameskerr jameskerr merged commit 6449d84 into master Apr 13, 2020
@jameskerr jameskerr deleted the detect-file-type branch April 13, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect File Type
2 participants