Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanner: refactor RevListsShas into type #1595

Closed
wants to merge 9 commits into from

Conversation

ttaylorr
Copy link
Contributor

This pull-request refactors the original implementation of the RevListShas method into a type that retains the same behavior.

This is done such that we can easily manipulate a channel of revs coming out of this function. That is not made use of (yet), but will become the case in a subsequent PR.

/cc @technoweenie @rubyist @sinbad @sschuberth @larsxschneider

// MatchesNameAndType returns whether or not the receiving *Ref "r" has the same
// name and type as the "other" *Ref. Nil *Refs are invalid values and thus do
// not possess equality.
func (r *Ref) MatchesNameAndType(other *Ref) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any (valid) situation where Name & Type are equal but Sha is not? Since we never modify refs I don't think so. In which case the struct equality operator is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I can't think of any legal situations where this would/could happen off-hand, so I simplified the behavior in cf8fac1.

return rev, ok
}

func (c *RevCache) Cache(sha, name string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a little confusing since sha->name would logically be a multi-map (shas can have multiple refs pointing at them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I totally misunderstood the code when I was reading it for the first time. The Cache(sha, name string) caches the SHA1 hash of a file to its name on disk (as used here).

I rename this type in 53f2abb, and added some docs in e16e2de.

//
// If there was an error encountered before beginning to scan, it will be
// returned by itself, with two nil channels.
func (r *RevListScanner) Scan(left, right string) (<-chan string, <-chan error, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringChannelWrapper has a more convenient interface for handling strings and errors rather than separate channels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Realise the scanner methods predated this but might be worth taking advantage of)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating whether or not to use the *XChannelWrappertypes when I was refactoring this. IMHO, I think passing the channels around directly and using thetools.CollectErrsFromChan` method is more lightweight than a new type, so I am going to leave this as-is for now.

This will become less of a sore spot a few more PRs down the line.

@ttaylorr
Copy link
Contributor Author

@sinbad thanks for the review! I pushed some more changes out to address your comments 👍

@technoweenie
Copy link
Contributor

Moving things to a new package gives us another opportunity to examine the interface. The lfs package has a ton of things exported that probably don't need to be. The scanner package's main task is to stream LFS pointers given a range of git commits. With that in mind, do scanner. NameCache, scanner.RevListScanner or even scanner.RevListShas() need to be exported? These seem like implementation details for an LFS object scanner.

Also, I don't think a scanner.RevListScanner needs to take a scanner.ScanRefsOptions in its constructor. What about something like this?

revlistScanner := newRevListScanner("origin", ScanRefsMode)
revlistScanner.SkipDeletedBlobs = true
revlistScanner.Scan(left, right)

If scanner.ScanRefsOptions is going to get removed in a future PR, that's totally cool 👍

@technoweenie
Copy link
Contributor

Actually, ignore the last part. I think we still need ScanRefsOptions so callers outside the scanner pkg can set this info. Then let each inner type like a revListScanner use that.

@ttaylorr
Copy link
Contributor Author

With that in mind, do scanner. NameCache, scanner.RevListScanner or even scanner.RevListShas() need to be exported?

I don't think that these types need to be exported. To keep these PRs small, I think it makes sense to export the same functions that exist in lfs/scanner.go until we've moved all of the code over. Once that's done, I figure we can reduce this package down to a single public function, and leave all of the implementation private.

I did, however, make the nameCache type private in d900abd, since that doesn't need to be exported today.

technoweenie
technoweenie previously approved these changes Oct 20, 2016
@technoweenie
Copy link
Contributor

This is done such that we can easily manipulate a channel of revs coming out of this function.

Can you provide some example of how this will work, and why a type is needed? The type shares all of the same properties of ScanRefsOptions. It looks like all that was changed is RevListShas() calls (*RevListScanner) Scan(), which is broken up to a few smaller funcs like refArgs() and scanAndReport(). Seems like RevListScanner is unnecessary, and definitely does not need to be exported.

@technoweenie technoweenie dismissed their stale review October 20, 2016 22:26

I thought about this some more and had another question.

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Oct 20, 2016

Can you provide some example of how this will work, and why a type is needed?

Sure. When I started this refactor, the presence of the ScanRefOptions argument to the function was a smell that indicated there might be a type to extract that would be more appropriate than a large function.

My first inclination was to extract a 2-3 smaller, static functions, and call those, but I hit another smell of passing that ScanRefOptions argument all the way down. I then extracted a type that has the same fields as ScanRefOptions. This is still a smell, but temporary one.

Since there is code from the lfs package that uses the ScanRefOptions type, and I wanted to keep this change isolated, I decided to de-structure the data on that type into fields when we construct the RevListScanner type. Eventually, when all of the stuff that was in lfs/scanner.go is in the scanner package, then I'll remove the type, and each of the three types that I'm proposing will only have the information they need.

I think having three types (one that scans refs, one that does the batch check, and one that converts blobs into LFS pointers) is a good idea in that each is isolated in their concern, and doesn't have access to data that they don't need. This makes the types easily testable, and compose-able, which I think is a solid step forward.

@technoweenie
Copy link
Contributor

I see ScanRefOptions as something similar to http.Request (with worse type names). If a caller wants to scan for lfs objects, it creates a new ScanRefOptions, calls Scan(), and gets back a result. It only has 3 immutable properties (I'm purposely ignoring nameCache, which I think we both dislike). Destructuring it just feels overkill to me.

  • The three classes are private, and closely related right now.
  • There are three because Git doesn't give us a better way to scan a repo. What if they did, an we could drop it to 1 command? Definitely isn't changing any time soon though.
  • Adding a property to ScanRefOptions would require also adding it to the 3 commands that need it. Also unlikely.

One upside, is if the three types end up with similar method names, they won't need prefixes to keep them apart:

revListScannerScan()
catFileBatchScan()

(*revListScanner) Scan()
(*catFileBatch) Scan()

If you really want to add the types, go for it. I was just asking a question. Some times I wish I would've gotten more questions about types I've written. If you do keep this, I still think RevListScanner should be private. I don't think anything uses it outside of the scanner package.

Eventually, when all of the stuff that was in lfs/scanner.go is in the scanner package, then I'll remove the type (ScanRefOptions), and each of the three types that I'm proposing will only have the information they need.

If you do that, then you'll need a public Scan function with 3 positional args:

scanner.Scan(scanner.ScanAllMode, "origin", false)

Positional args, especially for optional ones like SkipDeletedBlobs, are not my favorite. So I think it'd make more sense to just rename lfs.ScanRefOptions to something like scanner.Options, instead of removing it in one PR, and bringing it back in another.

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Oct 21, 2016

Thanks again for this feedback 👍 . I didn't do a great job of articulating what I'm going for in this series of PRs, which I think can make it difficult to look at each by itself. I did my best at responding to some of your questions below.

My inclination is to merge this PR and introduce the type after changing the name of ScanRefOptions to scanner.Options, and making the RevListScanner a private member of the package. I think this is a good idea, but if this does become overkill after all of the PRs in this series have been merged, then I definitely agree that we can change it fairly easily.

If a caller wants to scan for lfs objects, it creates a new ScanRefOptions, calls Scan(), and gets back a result.

I may be misunderstanding you, but my thinking is that the type could be made private so the entrypoint to this API is simple. More importantly, the caller doesn't have to concern him or herself with the implementation, or existence, of the *RevListScanner. I started working towards that here, but still need to make the type private to finish it off.

There are three because Git doesn't give us a better way to scan a repo. What if they did, an we could drop it to 1 command?

I think in either case we'd have to change a lot of the internals of the code, but I agree that this isn't a likely thing to happen in the near future.

Positional args, especially for optional ones like SkipDeletedBlobs, are not my favorite.

I definitely agree. In the final iteration of this series of PRs, I'm imagining that something like ScanRefOptions (maybe called *scanner.Options) will still exist, but we'd de-structure the fields on that struct into the fields that each component needs, that way they only have access to what they need to do their jobs.

@technoweenie
Copy link
Contributor

If a caller wants to scan for lfs objects, it creates a new ScanRefOptions, calls Scan(), and gets back a result.

I may be misunderstanding you, but my thinking is that the type could be made private so the entrypoint to this API is simple. More importantly, the caller doesn't have to concern him or herself with the implementation, or existence, of the *RevListScanner

I'm referring to scanner.ScanRefs() (scanner.ScanTree(), scanner.ScanIndex(), scanner.ScanUnpushed(), etc, are already simple). You're going to revert back to 2 positional and an optional arg? I'm not talking about RevListScanner, which has never been exported.

@ttaylorr
Copy link
Contributor Author

Hi, all. I wanted to post a quick update after I've given some thought to this PR over the weekend.

First of all, I'd like to thank @technoweenie and @sinbad for their feedback, and for asking hard questions.

My goal with this series of PRs was for them to be small, and easily digestible in pieces. I failed, however, to clearly articulate my vision and reasoning behind removing/adding some code in both the comments and PR description. I will try and further articulate my vision in the future.

For now, I am going to close this PR. My thinking here is that we have code in lfs/scanner.go that works, and is simple. Let's start by moving that code over to the scanner package, making it work in a parallel fashion (where there are no individual bottlenecks that can't be solved by increasing the parallelism), and then refactor later, if necessary.

I think this allows us to keep the simple code that we already have while making small, incremental changes moving us towards our goals. My new plan, going forward, is:

  1. In three separate PRs, move over the code from lfs/scanner.go into package scanner, leaving the lfs/scanner.go code to delegate into the new package.
  2. In the next PR, apply small, calculated refactorings as described above.
  3. Transition all calling code to use the scanner package, instead of the lfs/scanner.go functions.
  4. 🎉

Once again, I really do appreciate the feedback, and can't wait to get cracking on the rest of the PRs in this series. 🤘

@ttaylorr ttaylorr closed this Oct 24, 2016
@ttaylorr ttaylorr deleted the rev-scanner-refactor branch November 8, 2016 03:05
@ttaylorr ttaylorr removed the 1.5.0 label Nov 9, 2016
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 6, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 7, 2023
The RevListScanner structure of the "git" package is constructed by
the NewRevListScanner() function, which takes a ScanRefsOptions
structure as an argument.  The git.ScanRefsOptions structure encapsulates
the large number of flags and other fields which a caller may potentially
set for the "git rev-list" scanning operation.

The "lfs" package also has a similar ScanRefsOptions structure, which
is used by the various Scan*() methods of the GitScanner structure to
pass optional flags and fields to the internal functions which implement
the scanning operations.  However, unlike the "git" package, this
ScanRefsOptions structure is not used when constructing a GitScanner,
and is moreover never used outside of the "lfs" package.

We can therefore remove this structure in favour of a more conventional
set of internal elements encapsulated directly in the GitScanner structure.
This simplifies the signatures of many of the internal scan*() functions
and helps clarify what parts of the GitScanner structure actually
need to be exported.

As part of this change, we also create a new, dedicated and un-exported
nameMap structure, which holds the map of Git object SHAs to their
pathspecs, as populated during scan operations.  It is then utilized
in the existing lockableNameSet structure, whose Check() method is
called by the runCatFileBatch() and runCatFileBatchCheck() functions
to collect the pathspecs of files locked by other users.

Note that the intent to remove the ScanRefsOptions structure from
the "lfs" package was discussed in several PRs in 2016, and in
particular, this commit's changes achieve some of the goals of the
never-merged PR git-lfs#1595, but without introducing an entirely generic
"scanner" package.  See also:

git-lfs#1670 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants