Fix scan-remote-repo --branch logic #250

rbailey-godaddy · 2021-10-30T00:57:58Z

To help us get this pull request reviewed and merged quickly, please be sure to include the following items:

Tests (if applicable)
Documentation (if applicable)
Changelog entry
A full explanation here in the PR description of the work done

PR Type

What kind of change does this PR introduce?

Backward Compatibility

Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?

Yes (backward compatible)
No (breaking changes)

Issue Linking

fixes #247

What's new?

A fresh-cloned repository does not have branch information for remote branches. We must follow the information in remotes.origin.refs instead of branches (which will have only the main/master branch).
We add a new is_remote flag so the common code knows which strategy to employ.
A side-effect appears to be that because HEAD appears in the reference list, we'll scan the default branch twice, but I can live with this.

A fresh-cloned repository does not have branch information for remote branches. We must follow the information in `remotes.origin.refs` instead of `branches` (which will have only the main/master branch). We add a new `is_remote` flag so the common code knows which strategy to employ. A side-effect appears to be that because HEAD appears in the reference list, we'll scan the default branch twice, but I can live with this.

tarkatronic

Just one question/point of curiosity, to be sure we're not being excessive in our scan.

Also, is this something we're going to need to port over to the v3.x branch?

tarkatronic · 2021-11-01T14:28:59Z

tartufo/scanner.py

+                        )
+                else:
+                    # Everything
+                    branches = unfiltered_branches


Will this truly only be branches, or might this possibly include refs to PRs, tags, etc?

It might be excessive (I know in simple testing that HEAD showed up in the references list). However, I feel this is "harmlessly excessive":

If user specifies a branch, we are looking for a reference with the name of the branch, and therefore will scan only the branch the user asked us to scan

If the user didn't, we'll scan "everything" (which is what is intended). We might end up scanning parts of "everything" multiple times, to the extent multiple references appear -- in which case I hope most of this will be short-circuited by the "I've seen this diff before" lru cache and the impact will be minimized.

The fix for v3.x is much better than this one, so I'm willing to sacrifice some efficiency for correctness in the short term, knowing that when people transition to v3 it won't be a problem any longer.

Makes sense. 👍🏻

Subtle change: In testing, I noted that the fresh-cloned scratch repo had only a main (or presumably master) branch defined, so the old default scan-remote-repo (without a branch) actually was probably only scanning the master branch and not the entire repository -- so people might be surprised to discover things getting scanned now that weren't scanned (at least not since v2.0.2).

I think it's likely there will be a number of surprises for users as they move to v3.0 and find how much more accurate it is. 😄

tarkatronic · 2021-11-01T14:33:49Z

(Ignore that last part of my comment -- just saw the other PR!)

smimani-godaddy

LGTM 🦅

sushantmimani

LGTM 🦅

rscottbailey added 2 commits October 29, 2021 20:51

Changelog update

2ce6875

rbailey-godaddy requested a review from tarkatronic October 30, 2021 00:57

rbailey-godaddy requested a review from a team as a code owner October 30, 2021 00:57

rbailey-godaddy requested review from janerikcarlsen and mxhenry-godaddy October 30, 2021 00:57

Remove test duplicated in error

47573f5

tarkatronic reviewed Nov 1, 2021

View reviewed changes

tarkatronic approved these changes Nov 1, 2021

View reviewed changes

smimani-godaddy approved these changes Nov 1, 2021

View reviewed changes

sushantmimani approved these changes Nov 1, 2021

View reviewed changes

Merge branch 'main' into fix-remote-branch-v2

6dffc8f

rbailey-godaddy merged commit 9f13229 into main Nov 2, 2021

rbailey-godaddy deleted the fix-remote-branch-v2 branch November 2, 2021 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scan-remote-repo --branch logic #250

Fix scan-remote-repo --branch logic #250

rbailey-godaddy commented Oct 30, 2021

tarkatronic left a comment

tarkatronic Nov 1, 2021

rbailey-godaddy Nov 1, 2021

tarkatronic Nov 1, 2021

rbailey-godaddy Nov 1, 2021

tarkatronic Nov 1, 2021

tarkatronic commented Nov 1, 2021

smimani-godaddy left a comment

sushantmimani left a comment

Fix scan-remote-repo --branch logic #250

Fix scan-remote-repo --branch logic #250

Conversation

rbailey-godaddy commented Oct 30, 2021

PR Type

Backward Compatibility

Issue Linking

What's new?

tarkatronic left a comment

Choose a reason for hiding this comment

tarkatronic Nov 1, 2021

Choose a reason for hiding this comment

rbailey-godaddy Nov 1, 2021

Choose a reason for hiding this comment

tarkatronic Nov 1, 2021

Choose a reason for hiding this comment

rbailey-godaddy Nov 1, 2021

Choose a reason for hiding this comment

tarkatronic Nov 1, 2021

Choose a reason for hiding this comment

tarkatronic commented Nov 1, 2021

smimani-godaddy left a comment

Choose a reason for hiding this comment

sushantmimani left a comment

Choose a reason for hiding this comment