Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tartufo does not report the real infringing string but the one prefixed with +/- from diff output #183

Open
pmevzek-godaddy opened this issue Mar 25, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@pmevzek-godaddy
Copy link

pmevzek-godaddy commented Mar 25, 2021

🐛 Bug Report

What tartufo reports as "matched_string" seems to be the blob of text coming out of git diff, which includes as prefix + or - as it can happen, and not the real string as found in the file

To Reproduce

$ git init test
$ cd test
$ git commit --allow-empty --allow-empty-message -m 'Start'
$ cat secret.txt
OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8
$ git add secret.txt
$ tartufo --json --entropy --regex pre-commit | jq .
{
  "scan_time": "2021-03-25T14:44:56.244076",
  "project_path": "/private/tmp/test",
  "output_dir": null,
  "excluded_paths": [],
  "excluded_signatures": [],
  "found_issues": [
    {
      "file_path": "secret.txt",
      "matched_string": "+OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8",
      "signature": "ca29177c396aa5465f41495af1e486d666308b51b7dab52228730624466cbc25",
      "issue_type": "High Entropy",
      "issue_detail": null,
      "diff": "@@ -0,0 +1 @@\n+OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8\n"
    }
  ]
}
$ grep -F "+OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8" secret.txt
$ grep -F "OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8" secret.txt
OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8

TL;DR tartufo tells me there is a secret string of +OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8 which is not true as the sole content of the file has a string of OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8 without any + as prefix.

Besides the "display" problem (once one knows it has to ignore the + or - in prefix in order to find the real string matching in files), this creates another bigger problem: the signature given is computed over this matched_string, hence including the +, where the signature should be over the real string as existing in file.

The following shows the problem about this signature computation:

$ tartufo --json --entropy --regex --exclude-signatures "ca29177c396aa5465f41495af1e486d666308b51b7dab52228730624466cbc25" pre-commit | jq .
{
  "scan_time": "2021-03-25T15:42:44.513452",
  "project_path": "/private/tmp/test",
  "output_dir": null,
  "excluded_paths": [],
  "excluded_signatures": [
    "ca29177c396aa5465f41495af1e486d666308b51b7dab52228730624466cbc25"
  ],
  "found_issues": []
}
$ git commit -m 'With secret'

# Now I edit the file to add something completely unrelated
$ cat secret.txt
OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8

blah blah blah
$ git add secret.txt
$ tartufo --json --entropy --regex --exclude-signatures "ca29177c396aa5465f41495af1e486d666308b51b7dab52228730624466cbc25" pre-commit | jq .
{
  "scan_time": "2021-03-25T16:17:33.317483",
  "project_path": "/private/tmp/test",
  "output_dir": null,
  "excluded_paths": [],
  "excluded_signatures": [
    "ca29177c396aa5465f41495af1e486d666308b51b7dab52228730624466cbc25"
  ],
  "found_issues": [
    {
      "file_path": "secret.txt",
      "matched_string": "OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8",
      "signature": "17aac285d32f9fb48ef3d028b1e0a10467f336888a146366da3dee0ddd671b0e",
      "issue_type": "High Entropy",
      "issue_detail": null,
      "diff": "@@ -1 +1,3 @@\n OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8\n+\n+blah blah blah\n"
    }
  ]
}

TL;DR: I run tartufo, it finds a secret, I use the given signature to exclude it from further reports, the secret is now not visible anymore by tartufo, I do another edit in the file, completely unrelated, I re-run tartufo with the same exclusions and now it finds the secret again!

Why? Because now it sees and reports the previous secret exactly as it appears in file, not with the leading + from the git diff, as it can be easily proved by:

$ git --no-pager diff --cached
diff --git a/secret.txt b/secret.txt
index 049ae76..1edfd07 100644
--- a/secret.txt
+++ b/secret.txt
@@ -1 +1,3 @@
 OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8
+
+blah blah blah

So the previous signature that I excluded in my run obviously does not cover the string as is really in file (OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8) as the previous signature was computed with this string and a + prefix.

This whole situation leads to head scratching when one believes previous reports of tartufo to add signatures to exclude and later on, for other completely unrelated changes, get new tartufo run that suddenly detect past strings again while there should have been excluded by signature.

It also work mostly by chance right now, because in the diff output that tartufo seems to use, all lines have a prefix, which could be a whitespace as there is above, but the whitespace is excluded from the list of characters considered for high entropy scanning as it is not among the base64 alphabet. But + is in the base64 alphabet, hence it is picked up. Interestingly for lines prefixed by - in diff, it wouldn't be picked up either... until issue #177 is taken into account to have base64url alphabet also, which has - as character.

Expected Behavior

As the offending string in the file is OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8 tartufo should report that the matched string is OnVybD48c21kOnZvaWNlPiszMi4yMDAwMDAwMDwvc21kOnZvaWNlPjwvc21kOmlzc3VlckluZm8, not something else and the signature should be computed over that string precisely (without any + or - prefix) so that if we decide to exclude it by signature it would never pop up again at any other future tartufo run for this string in this file.

Conceptually this problem is also related to the other issue at #174 as both cases seem to show a problem in how tartufo uses differences between two commits.

Code Example

N/A

Environment

tartufo 2.4.0

@pmevzek-godaddy pmevzek-godaddy added the bug Something isn't working label Mar 25, 2021
@TWood67
Copy link

TWood67 commented Aug 17, 2021

I was able to reproduce this. This is also causing me to have similar issues. What version of git do you have @pmevzek-godaddy?

$ git --version
git version 2.24.1
hub version 2.13.0

@pmevzek-godaddy
Copy link
Author

I was able to reproduce this. This is also causing me to have similar issues. What version of git do you have @pmevzek-godaddy?

$ git --version
git version 2.32.0

(but it may have been a lower version at the time of entering the issue; I don't think however it should change anything as for me the core issue seems more around how tartufo parses and uses the git diff output, you can see at #174 another problem I had that is probably related)

@prajasekaran-godaddy
Copy link

prajasekaran-godaddy commented Mar 15, 2022

Might be a different issue, but I had a problem similar to this with tartufo 3.0 version. If I install tartufo 2.10 using pip, I get both + prefix and without it.

In 2.10, it produces both results - prefixed and without prefix

tartufo --regex --json scan-local-repo --no-fetch ${REPO_NAME} | jq -r '.found_issues[].matched_string' | sort -u

In 3.0, it results in only the + or - prefixed strings.

tartufo --regex --output-format json scan-local-repo ${REPO_NAME} | jq -r '.found_issues[].matched_string' | sort -u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants