Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan between 2 SHAs #203

Open
harinee opened this issue Jun 27, 2020 · 7 comments
Open

Scan between 2 SHAs #203

harinee opened this issue Jun 27, 2020 · 7 comments

Comments

@harinee
Copy link
Collaborator

harinee commented Jun 27, 2020

Is your feature request related to a problem? Please describe.
For really huge pieces of code, we may not always want to scan all of git history.

Describe the solution you'd like
There should be a way to scan between 2 SHAs. So, when I run, say, talisman --scan <sha1> <sha2>, the scan should begin with sha1 and scan uptil sha2 (both sha1 and sha2 included in the scans).
The execution command and parameter structure doesn't have to be exactly as mentioned here. Feel free to experiment with that technically.

@harinee harinee added enhancement scanner Talisman CLI scanner mode priority On priority for picking up help wanted and removed priority On priority for picking up labels Jun 27, 2020
@tinamthomas tinamthomas self-assigned this Aug 14, 2020
tinamthomas added a commit to tinamthomas/talisman that referenced this issue Aug 26, 2020
tinamthomas added a commit to tinamthomas/talisman that referenced this issue Aug 28, 2020
@tinamthomas tinamthomas removed their assignment Sep 21, 2020
@tinamthomas
Copy link
Collaborator

Additional info. Currently scanner gets all commits. See https://github.com/thoughtworks/talisman/blob/master/scanner/scanner.go#L66.
Can possibly use git revision ranges to solve this issue: https://git-scm.com/docs/gitrevisions

@AbsoLouie
Copy link
Contributor

I think the issue comes down to this line: https://github.com/thoughtworks/talisman/blob/master/scanner/scanner.go#L45

git ls-tree -r will list all files in the repo at the time of that commit, regardless of if the change was made in that commit.

@AbsoLouie
Copy link
Contributor

Might be able to use git diff <sha1> <sha2> --name-only to get the file names and then feed that into git ls-tree -r <commit> <list of file names>

@teleivo
Copy link
Contributor

teleivo commented Apr 9, 2021

Hello 😄 I am interested in working/collaborating on this issue. @tinamthomas are you still working on it? Can I assign myself or collaborate with you? I have used talisman in a project before and can relate to the issue of wanting faster scans.

regarding

I think the issue comes down to this line: https://github.com/thoughtworks/talisman/blob/master/scanner/scanner.go#L45

git ls-tree -r will list all files in the repo at the time of that commit, regardless of if the change was made in that commit.

I think from reading https://github.com/thoughtworks/talisman#talisman-in-action that talisman will only check the diff of staged files when used as pre-commit hook. The pre-post checks the entirety of files in the diffs to be pushed.

When it comes to --scan and --ignoreHistory the docs are not super clear IMHO --scan scanner scans the git commit history for potential secrets. Does that mean checks every file touched in each commit or every file you would have in the working directory when checking out each commit? I assume from your comment that its the latter. Since pre-post checks the entire file in a diff I would also assume its a deliberate decision to do the same when checking the entire history. But it would be great to hear what the design decisions were.

Scanning every file entirely on each commit in the history is probably not necessary. If a secret has been added in some commit it should be found in an addition in a diff since we check every diff anyway.

Am interested in hearing your thoughts and how I can help 😄

@tinamthomas
Copy link
Collaborator

Hi @teleivo, Feel free to assign yourself to it and we can collaborate on it.

@teleivo
Copy link
Contributor

teleivo commented Apr 15, 2021

@tinamthomas thank you :) I cannot assign myself. Could this be because I am not a contributor/collaborator?

@tinamthomas tinamthomas self-assigned this Apr 18, 2021
@tinamthomas
Copy link
Collaborator

@teleivo I've assigned myself to the issue for now.

What I see happening now from the code:

  1. Get all commits
  2. For each commit, find the files that were changed, and check the entire file contents at that time for any secrets.

I like your idea of only looking in the additions for potential secrets!

This could potentially improve scan speed. Let me start putting the changes together and we can collaborate on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants