-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heuristics for "random" values to help with base-encoded typo false positives #484
Comments
Yes, we have several issues related to hashes / base encodings of some sort
Having some kind of heuristic to discard hashes / base-encodings beyond a strict syntax check would be a big help. What that'd look like is the question though. To start off brainstorming,
Any other ideas for heuristics and for what the Xs and Ys should be? #316 has a list of alternative approaches. Feel free to share how useful or not those approaches would be in that issue. |
think the most important would be to have a way to opt out of tricky situations, like you may want to have a text string that has typos in it included in a test code or similar, and there will be cases that are difficult to detect properly with these type of base-encoded numbers of JSON strings and other stuffs. So having some solution like the ones in #316 to opt out would be great robust fallback. For our particular use cases having a way to disable handling through comments that enable/disable the spell check would work |
Maybe this will help there is 'ripsecrets' which is a tool written in rust which uses ripgrep to find secrets in an existing project. So these regexes could be reused for 'high heuristic' values: |
@boris-smidt-klarrio thanks! For now, I've at least linked to that in the docs in 8b729e1 |
Not sure if it will work with the ignores because of the way the tokenizer works. I had a look at it but i assumed it kept on splitting tokens until it finds UUIDs/ words or numbers. So is there a setting to add other entries to the tokenizer with these regexes? |
@boris-smidt-klarrio |
@epage Thank you it works! |
This is a base58-encoded string from our codebase, is there some heuristic for the typo checker to not consider such a long "random" string to not be a word and not suggest anything for it? This was part of a larger JSON string in a test.
Here is another similar one also from a embedded JSON string:
These are the last two major false positives we've been seeing in our codebase with typos, works really well otherwise!
The text was updated successfully, but these errors were encountered: