You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As referenced in #493, if the secret is written into a file at multiple locations, only the first one is identified by detect-secrets. The problem here is that having multiple GitHub tokens with different values in the same file, they are still interpreted as if they were the same.
(ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36}
There is one capturing group: (ghp|gho|ghu|ghs|ghr). This group is designed to match and capture the prefix part of a GitHub token.
Because of this capturing group, when findall() processes a string matching this pattern, it does not return the entire match ("ghp_...36 characters..."). Instead, it returns only the part of the match that corresponds to the capturing group, which in your test cases would be "ghp", "gho", etc., depending on the token.
Example:
If you were to run findall() on a string like "Test ghp_abc123...", given the regex above, the output would be:
['ghp'] # Instead of ['ghp_abc123...']
This output occurs because findall() focuses solely on the capturing group, rather than the entire pattern.
What is the expected behavior?
The expected behavior would be to capture all the different secrets in a file.
Please tell us about your environment:
detect-secrets Version: 1.5.0
Python Version: 3.12.4
OS Version: macOS Sonoma 14.4.1
Other information
In the analyze_string function, maybe using finditer() could solve the issue to ensure that the entire matching string is retrieved.
for match in regex.finditer(string):
yield match.group(0) # Returns the entire matched string
finditer() yields match objects from which you can extract specific groups or the entire match (via match.group(0)), providing flexibility and precision in handling regex matches.
The text was updated successfully, but these errors were encountered:
karamuz
added a commit
to karamuz/detect-secrets
that referenced
this issue
Jun 20, 2024
I'm submitting a ...
What is the current behavior?
For example, given the file
test_ghp.txt
:When I scan the file, I get these results:
As referenced in #493, if the secret is written into a file at multiple locations, only the first one is identified by detect-secrets. The problem here is that having multiple GitHub tokens with different values in the same file, they are still interpreted as if they were the same.
In the regular expression used here:
(ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36}
There is one capturing group:
(ghp|gho|ghu|ghs|ghr)
. This group is designed to match and capture the prefix part of a GitHub token.Because of this capturing group, when findall() processes a string matching this pattern, it does not return the entire match ("ghp_...36 characters..."). Instead, it returns only the part of the match that corresponds to the capturing group, which in your test cases would be "ghp", "gho", etc., depending on the token.
Example:
If you were to run findall() on a string like "Test ghp_abc123...", given the regex above, the output would be:
['ghp'] # Instead of ['ghp_abc123...']
This output occurs because findall() focuses solely on the capturing group, rather than the entire pattern.
The expected behavior would be to capture all the different secrets in a file.
Please tell us about your environment:
Other information
In the analyze_string function, maybe using finditer() could solve the issue to ensure that the entire matching string is retrieved.
finditer() yields match objects from which you can extract specific groups or the entire match (via match.group(0)), providing flexibility and precision in handling regex matches.
The text was updated successfully, but these errors were encountered: