Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Add dep prune subcommand #322

Merged
merged 2 commits into from
Apr 13, 2017
Merged

Add dep prune subcommand #322

merged 2 commits into from
Apr 13, 2017

Conversation

tomwilkie
Copy link
Contributor

Part of #120

This changes adds a basic pruning strategy for the vendor directory. Before this changes, vendoring with dep resulting in a +4m loc change to weaveworks/cortex vs gvt, after running prune its -1.3m.

There is a bunch left to do, I thought it best to open this PR and get feedback early. If you like this approach I will open a PR on sdboyer/gps for the PackageDeps method.

We could also potentially add code to remove non-go files and *_test.go from vendoring - currently I use this in my makefile:

find vendor -type f ! -name '*.go' -a ! -name '*.s' -a ! -name '*.h' -a ! -name '*.proto' -a ! -name LICENSE | xargs -L 10 rm
find vendor -type f -name '*_test.go' | xargs -L 10 rm

@tomwilkie
Copy link
Contributor Author

Test failure seems to be related to credentials, I think its spurious.

@sdboyer
Copy link
Member

sdboyer commented Mar 13, 2017

Awesome, so cool that you jumped on this right away!

At the same time, I feel a bit bad - I totally meant to follow up over on #120 with some guiding suggestions on this, and didn't 😞. Hopefully it won't feel like too much wasted time, as this clearly gave you an opportunity to reach in and get familiar with some more of the gps API - which is great!

So, a couple issues with the approach here. First - I think that all the information you're gleaning from the filesystem should be available in the project's lock file, via the packages key each entry has. The general rule behind the behavior of a gps-backed system is that we try to operate in terms of relations between a series of states:

all-sync

Where "P" is the project code, "M" is the manifest, "L" is the lock, and "D" is the dependency code. As written, this is sorta jumping from "P" straight to "D". It's not that that can't work, but that it could undermine overall system consistency, if only by working differently than the other commands. I'm happy to go into more detail if you want, but for now will spare you the details 😄

To this end, I might change the following:

  1. Start with doing input hash comparison against what's in the lock (see the dep status cmd implementation for an example). If the hashes don't match, I'd say this should probably bail out; better to have manifest and lock in a consistent state before messing with vendor/.
  2. If the hashes match, then trust the package list you see in the lock; don't bother with reparsing code in vendor for imports.

In an entirely separate direction, the thing I was thinking about posting last night was that we might instead look at this as modifications to be made to gps.WriteDepTree. That third parameter, which currently controls whether or not nested vendor/ dirs should be stripped out, could be changed to take, say, a bitmask that controls which of a set of pruning behaviors should be applied.

The advantages of taking this approach are:

  1. We can potentially avoid ever writing the files, rather than removing them after writing them.
  2. We can consider the possible pruning strategies and implement them, independent of what we immediately decide in dep. dep can then decide what it needs and just flip flags later.
  3. In the scenario that this ends up in a standalone tool, it'll still be easy for that tool to pick up and just use the logic we've written into gps.

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Mar 13, 2017 via email

@tomwilkie
Copy link
Contributor Author

@sdboyer I've added the check for the lock file hash; it doesn't seem to make a huge amount of sense to read the dependencies out of the lock file, as it seems you need to calculate the dependencies to check the lock file hash... let me know if I'm missing anything here.

instead look at this as modifications to be made to gps.WriteDepTree.

If I'm reading the code correctly, wouldn't this entail making prune an option to ensure? In #120 you seemed against that approach...

@sdboyer
Copy link
Member

sdboyer commented Mar 14, 2017

I've added the check for the lock file hash; it doesn't seem to make a huge amount of sense to read the dependencies out of the lock file, as it seems you need to calculate the dependencies to check the lock file hash... let me know if I'm missing anything here.

Actually computing dependencies is done by a call to gps.Solver.Solve(); all you're doing here is calling gps.Prepare(). That just readies a solver for use; most of what it's doing is validating inputs. Use cases like this are pretty much the whole idea behind the input hash.

If I'm reading the code correctly, wouldn't this entail making prune an option to ensure? In #120 you seemed against that approach...

It wouldn't necessitate it - we could implement pruning as a call to that func, so it'd just recreate vendor/ rather than operating on an existing one - but...basically, yeah, this is why I felt bad about not updating the issue with the additional guidance. These are two fairly divergent directions.

@tomwilkie
Copy link
Contributor Author

computing dependencies is done by a call to gps.Solver.Solve()

Ah I see, you're suggesting we compute the interdependencies between the modules in vendor using the solver? I think I can do that..

we could implement pruning as a call to that func

Okay, I think I follow - let me take a another stab. Thanks for your patience.

@sdboyer
Copy link
Member

sdboyer commented Mar 14, 2017

Ah I see, you're suggesting we compute the interdependencies between the modules in vendor using the solver? I think I can do that..

Err...no? I'm suggesting the opposite. I'm saying don't call gps.Solver.Solve(), because you don't need to in order to get the input hash for comparison. It's basically this:

	s, err := gps.Prepare(params, sm)
	if err != nil {
		return fmt.Errorf("could not set up solver for input hashing: %s", err)
	}
	if bytes.Equal(s.HashInputs(), p.Lock.Memo) {
		// do prune stuff, because we know the lock is an acceptable solution for the inputs
	}

I'm also suggesting that you not care about the import statements of the actual code in vendor/, because we should treat the code in there as inert and dead; the relevant import information should be fully described by the packages properties in lock.json (if it's not, then it's a bug).

@tomwilkie
Copy link
Contributor Author

Ah I see - thats what I'm doing here: 73da178#diff-ea232c8c4adc96b53990f3c480439164R80

I didn't realise the lock file also contains transitive dependancies, but that makes sense now.

@tomwilkie
Copy link
Contributor Author

modifications to be made to gps.WriteDepTree. That third parameter, which currently controls whether or not nested vendor/ dirs should be stripped out, could be changed to take, say, a bitmask that controls which of a set of pruning behaviors should be applied.

I've done this now; didn't go for a bit mask just yet, as there are only two options.

We can potentially avoid ever writing the files, rather than removing them after writing them.

I still copy the files into (a temporary) vendor dir then go through and remove them, as that was a small change and matches how the nested vendor dir stripping works.

WDYT?

@tomwilkie
Copy link
Contributor Author

@sdboyer ping?

@sdboyer
Copy link
Member

sdboyer commented Mar 26, 2017 via email

@tomwilkie
Copy link
Contributor Author

No worries! Just wanted to make sure it wasn't because it took me 4 goes to get there...

Let me know if I can help out in any way.

@nathanielc
Copy link
Contributor

FWIW I tested this PR locally and ran into an error:

$ dep prune -v
error while exporting github.com/hashicorp/serf: fatal: failed to unpack tree object 2ab54b6bc64fe39d0b4ea242b144a204e2cdfc1d
: exit status 128

I have an override specified for the serf package and it specifies an alternate location:

"github.com/hashicorp/serf": {
    "branch": "nc-fix-races",
    "source": "https://github.com/nathanielc/serf.git"
}

@sdboyer
Copy link
Member

sdboyer commented Apr 5, 2017

...and i did it again. Sorry, @tomwilkie. Let's get this moving. First thing - we have to split it up, as we can't make changes directly in vendor/. (It's strange that tests are even passing - I've pinged @jessfraz about the black bash magic she conjured in hack/validate-vendor.bash). So let's pull the gps changes out into a PR against gps.

In the meantime, let's go back to just doing the pruning work after the gps.WriteDepTree() call. So, copy over calculatePrune() and deleteDirs() into somewhere within dep, then just do those operations in-place on vendor tree you've written to the tempdir in PruneProject().

Sound good? Again, sorry for the ridiculous delay.

@sdboyer
Copy link
Member

sdboyer commented Apr 5, 2017

@nathanielc interesting - I don't immediately see what in this PR would cause that to happen (but not happen on a normal dep ensure, but it's a good data point- thanks!

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Apr 5, 2017 via email

- Check the external dependencies match those in the lock file.
- Read all the dependencies out of the lock file, don't calculate them myself.
- Prune by rebuilding the vendor dir with WriteDepTree
@tomwilkie
Copy link
Contributor Author

@sdboyer there you go - I squashed the whole thing down and rebased on something more recent too. No changes in gps required.

@nathanielc there were a few bugs in the version you ran, when I moved the code into gps I accidentally got a bit over-zealous about what I deleted. I've ran this new version on a couple of large project and it worked for me - mind testing again?

@nathanielc
Copy link
Contributor

@tomwilkie Thanks its working! My vendor dir went from 2M loc to 500K loc. Thanks!

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Apr 7, 2017 via email

@jessfraz
Copy link
Contributor

hey hey sorry for the delay, back to real life now, so is the validate vendor still not working... i see that this is green but I can look into any problems with it

@sdboyer
Copy link
Member

sdboyer commented Apr 10, 2017

@jessfraz welcome back 😄 so @tomwilkie's latest doesn't break the vendor rule anymore, but if you look at 555e8de locally, then it does contain such changes, but the script didn't catch em.


const pruneShortHelp = `Prune the vendor tree of unused packages`
const pruneLongHelp = `
Prune is used to remove unused packages from your vendor tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I now just have one docs nit...let's please have this say the following:

Prune is used to remove unused packages from your vendor tree.

STABILITY NOTICE: this command creates problems for vendor/ verification. As such, it may be removed and/or moved out into a separate project later on.

I just want to warn people everywhere we can 😄

@jessfraz
Copy link
Contributor

@jessfraz welcome back 😄 so @tomwilkie's latest doesn't break the vendor rule anymore, but if you look at 555e8de locally, then it does contain such changes, but the script didn't catch em.

sweet will play around and see whats up

@tomwilkie
Copy link
Contributor Author

@sdboyer sorry for the delay, am travelling tight now. Have added the stability notice. Thanks.

@sdboyer
Copy link
Member

sdboyer commented Apr 13, 2017

Fantastic. In we go. Thanks so much!

@sdboyer sdboyer merged commit 864d348 into golang:master Apr 13, 2017
@tomwilkie
Copy link
Contributor Author

tomwilkie commented Apr 13, 2017 via email

ibrasho pushed a commit to ibrasho-forks/dep that referenced this pull request May 10, 2017
@VojtechVitek
Copy link

@tomwilkie @sdboyer Any reason this was implemented as a separate command and not in form of a flag on ensure cmd?

$ dep ensure -prune -update
$ dep ensure -prune github.com/pkg/errors@^v2.0.0

This is good progress, by the way! Thanks for your work.

@tomwilkie
Copy link
Contributor Author

@VojtechVitek lots of discussion as to why went on on issue #120

@tomwilkie tomwilkie deleted the prune branch June 6, 2017 16:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants