-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect file type #581
Detect file type #581
Conversation
This reads the first bits of the file and checks some heuristics. This will be used for choosing which endpoint to send the data to. /packets or /zeek.
src/js/lib/fileType.js
Outdated
} | ||
|
||
function isZeekHeader(line) { | ||
return /^#\w+\s+\S+/.test(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will match a trivially-contrived sourcable sh file, like
#source me to do stuff
export A=b
Instead of ^#\w+\s+\S+
, check for ^#separator \S+
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. How does the proposed regex prevent that problem?
I'm also checking the first 4 lines of the file to see if they match that regex. Is it true that all zeek filees start with the "separator" header?
Lastly, is it a security concern if they spoof this with a non-zeek file? All we do with it pass along the file path to zqd. Whatever they do in zqd will probably throw an error. I don't think anything is able to be "sourced".
Let me ask a zqd dev if this needs to be more robust. @henridf will this zeek detector regex be good enough for ingest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that all zeek filees start with the "separator" header?
I think they must since all lines after have to use the separator as defined.
Lastly, is it a security concern if they spoof this with a non-zeek file?
Not necessarily. My $0.02 was, if we are doing file detection we can at least block obviously invalid text files before they go all the way through to zqd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww I see what you mean. If someone accidentally gives us a file with standard #
comments at the top of the file (which many files do). Got it. Yeah I'll update that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Still happy to hear anyone else's comments on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments.
Fixes #573
This function reads the first 4 bytes to see if it matches the pcap signatures, then checks for zeek headers on the first four lines with a regex, then checks for json by reading the first 4 lines and seeing if they are json parsable.