-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved regexes #52
improved regexes #52
Changes from all commits
49ac37a
c8ad3b1
0896307
7a1151f
a4026ae
5c73e7d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,10 @@ | ||
password|||10|||triggers unwanted classes like password reset, hence the low score | ||
privatekey|||80 | ||
private_key|||80 | ||
apikey|||75 | ||
http:|||10 | ||
https:|||7 | ||
database_secret|||80 | ||
database_password|||80 | ||
databasepassword|||80 | ||
databasesecret|||80 | ||
(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api' | ||
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test' | ||
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat' | ||
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score | ||
(private|secret|api|aws)[_-]?key|||80 | ||
https?:|||7 | ||
(db|database)[_-]?(passw(d|ord)?|secret)|||80 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same note on pwd
|
||
https?:\/\/.*(uat|test|api).*|||60||| This regex matches any URL containing 'api|uat|test' | ||
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|||40||| Matching IP adresses | ||
^[a-f0-9]{32}$|||70||| MD5 hash | ||
\b([a-f0-9]{40})\b|||70||| SHA1 hash | ||
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$|||70||| base64 string | ||
Authorization: Basic|||95||| Basic authentication | ||
Authorization: Basic|||95||| Basic authentication |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
http:|||"res","layout"||| Suggested by Adi | ||
(https|http):\/\/.*api.*|||"res","layout"||| Suggested by Adi | ||
http:\/\/schemas\.android\.com\/apk\/res\/android|||||| | ||
https?:\/\/.*api.*|||"res","layout"||| Suggested by Adi | ||
http:\/\/schemas\.android\.com\/apk\/res\/android|||||| |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,12 @@ | ||
password|||10|||triggers unwanted classes like password reset, hence the low score | ||
privatekey|||80 | ||
private_key|||80 | ||
apikey|||75 | ||
http:|||10 | ||
https:|||7 | ||
database_secret|||80 | ||
database_password|||80 | ||
databasepassword|||80 | ||
databasesecret|||80 | ||
(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api' | ||
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test' | ||
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat' | ||
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pwd again ?
|
||
(private|secret|api|aws)[_-]?key|||80 | ||
https?:|||7 | ||
(db|database)[_-]?(passw(d|ord)?|secret)|||80 | ||
https?:\/\/.*(uat|test|api).*|||60||| This regex matches any URL containing 'api|uat|test' | ||
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|||40||| Matching IP adresses | ||
^[a-f0-9]{32}$|||70||| MD5 hash | ||
\b([a-f0-9]{40})\b|||70||| SHA1 hash | ||
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$|||70||| base64 string | ||
Authorization: Basic|||95||| Basic authentication | ||
SELECT \* FROM|||40||| Intersting SQL transaction | ||
INSERT INTO .* VALUES|||40||| Intersting SQL transaction | ||
INSERT INTO .* VALUES|||40||| Intersting SQL transaction | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Caution with spaces, might also want to handle newlines and tabs (and non-blocking spaces, and all the annoying stuff that can be put in there).
Maybe also try to detect cases where an sql query is performed by inserting vars directly in the string as it also provides an access point (there WILL be problems with false positives, newline/tab handling, ... there is only so much one can do wit a type 3 grammar and this was quickly thrown together and is by no mean a proper regex for this usage) :
I don't see why their would be update or create in a client, but you never know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! So your suggestion is to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will go for this one: https://regex101.com/r/cYSPbB/1 Many thanks for your idea, it drove me in the right direction! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commit is pushed. I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
great
Not the kind of keyword that would produce a lot of noise, and pwd is not all that rare a var name to hold sensitive data. Mind if I ask where/what is your work ?
Not sure which field you're talking about. Is it this one ? (SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+)| (UPDATE\s[\w]+\sSET\s[\w\,\'\=]+)| (INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+)| (DELETE\sFROM\s[\d\w\'\=]+) In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 ) SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+
UPDATE\s[\w]+\sSET\s[\w\,\'\=]+
INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\sFROM\s[\d\w\'\=]+ And as I said, there might be more than one space (typo, newline and tabs for readability, ...), so you actually want to have all your SELECT\s+[\w\*\)\(\,\s]+\s+FROM\s+[\w]+
UPDATE\s+[\w]+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+ Plus, it is not uncommon to name tables and columns with SELECT\s+(.|\n)+\s+FROM\s+[\w]+
UPDATE\s+.+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+
Overall, there is no need to be too selective on those regexes, the simple fact that the requests are made from a client is already a proof that there is a problem ; your goal with those is only to know :
So what you want to know is just where those are and their content. I'd personally go for: SELECT\s(.|\s)+\sFROM\s.+
UPDATE\s+(.|\s)+\s+SET\s.+
INSERT\s+INTO\s+(.|\s)*\sVALUES\s.+
DELETE\s+FROM\s.+ To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge. Handling multiple lines is going to make the pattern matching very slow, so you might want to group all those in a single item, so as to lower the number of tests, but that will increase the number of false positives so check that regexes attempt to match the shortest pattern first or it's going to be a mess) (SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I currently work for Zionsecurity. My professional profile can be found here: https://vincentcox.com/linkedin.
I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.
Will add it to the roadmap so I don't forget:
That would be the final regex, am I correct?
You are 100% right, I do agree with you! Thanks for taking (a lot) of time for writing your response and motivating your suggestions. I will add you to the top contributors, I really appreciate it! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, I was actually talking about turning the regex into 4 independant regexes, so that should be fine. Soooo... I hadn't actually tested the regexes and had written on the fly as they more intended as examples than actual implementations. This A working variant could be good if it was to run on strings extracted from the source code, but running it on the code itself is probably going to produce so many false positives each time that no one is going to pay attention to the field's results (so maybe in a later version, but right now I don't think such a feature is supported). You'd need to test it though, as it's mostly assumptions based on how I think the regex is going to behave. You could have a good surprise. Working version : (SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)? Added starting delimiters detection (to run against raw code and avoid detection of single vars whose name ends with a regex' start). This might actually provoke some false negatives, but I think most cases are handled here: [;'\"\s()](SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?
Or, for more specifics, you could split them up in 4 regexes (but as said in my previous post, you usually want to avoid multiple-lines matches, they make the overall pattern matching much slower): [;'\"\s()]SELECT\s(.|\s)+\sFROM\s
[;'\"\s()]UPDATE\s(.|\s)+\sSET\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sVALUES\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sSET\s
[;'\"\s()]DELETE\s+FROM\s(.|\s)+ And yes, that's pretty much the first suggestion I made. If you want to display the matches until the end of the line, you could add a Quick explanation on the you could place them after pretty much everything ( If the website you linked matches the same way stacoan does, you might want to look if there is a way to ask for the shortest match instead of the longest (it's different algorithms, I don't know if both are implemented) and give a way to ask to use one or the other. Here, it would really change a lot on the output and be much more interesting in many cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I optimized your regex here (I replaced If you give your go, I will add it to the list 👍 About the seperate queries: I really appreciate all the effort on explaining how you came to these regex's, therefore I added you in the top contributor's list: 19f956f The develop will be merged to master this weekend if all go well. Everyday I am amazed how great the INFOSEC community is 🎉 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support for pwd ? Handle camel case to detect vars that might hold interesting information at some point ? Is passw actually used in applications to justify the
?
?(P|p)(wd|assw(d|ord))
Maybe the case is handled, didn't check in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. In a hour I will push a new commit to the development branch with also updated wordlists.
Maybe the case is handled, didn't check in the code.
-> The code checks all regex's case insensitive.