Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor the dependencies of the sub-packages only, not of the parent package #166

Closed
kshlm opened this issue Dec 14, 2015 · 25 comments
Closed

Comments

@kshlm
Copy link

kshlm commented Dec 14, 2015

The dependent packages can have subpackages, but what is their use? They don't seem to be used any where.

For example, my project uses github.com/coreos/etcd/client. I have the following entry as a dependency,

- package: github.com/coreos/etcd
  version: v2.2.2
  subpackages:
  - /client

I had assumed that because subpackages had been provided, glide would just resolve the dependencies of the subpackages. But instead, glide resolves dependencies of etcd and pulls in way too many dependent packages, which I have no use of in my project.

I want to know if this is the expected behaviour?

@kshlm
Copy link
Author

kshlm commented Dec 14, 2015

Should I rather just use the following,

- package: github.com/coreos/etcd/client
  version: v2.2.2

@GeertJohan
Copy link

Glide /could/ use the set of subpackages to only recursively vendor dependencies of the mentioned subpackages, instead of recursively vendoring the imports for all subfolders. See #165 for reasons why.

@mattfarina
Copy link
Member

Working with Go packages and versions is a bit complicated because there is technically no higher level "project" grouping. Version control systems group packages at a higher level and versions are at this level. So, you can't (and shouldn't) have two different versions of packages within github.com/coreos/etcd.

When you use go get or the other tools that fetch dependencies they do so at the repo level. They have to get everything. Try cloning a subdirectory from a git repo on GitHub. It doesn't work that way. So, have to fetch everything from a repo.

We could be better with sub-package handling for cleanup and walking the tree. Our first priority is to cleanup some internals in Glide so it's easier to maintain and contribute to. Better handling will come.

Part of the reason we have issues like this is the use of monorepos where clients and everything is in one repo. It may be easier for the developers (or they think it is) of the project. But, for consumers of things like a client it's harder. I had not run into such a prevalence of clients being part of the repo/project it communicates with until working in Go.

@kshlm
Copy link
Author

kshlm commented Dec 15, 2015

@mattfarina I just wanted to confirm what the expected behaviour of subpackages is.

I think we agree that using subpackages should cause only the dependencies of the subpackages should be vendored.

This is not how glide works currently, but is something that is being aimed at in a later release. Can we convert this issue to an enhancement request then?

@GeertJohan
Copy link

When you use go get or the other tools that fetch dependencies they do so at the repo level. They have to get everything. Try cloning a subdirectory from a git repo on GitHub. It doesn't work that way. So, have to fetch everything from a repo.

When you go get a package named A, only A's dependencies are retrieved. go get doesn't search through all the folders in the the repository where package A resides to get dependencies for those subfolders. So yes we should clone the whole repo, but only the dependencies for the actually used packages should be retrieved recursively.

I agree with @kshlm that this issue could be labeled as enhancement request. Or maybe even bugfix.

@mattfarina
Copy link
Member

If we can accurately scan the dependency tree in use and only pull down those dependencies that would be a useful optimization. I welcome a pull request to add this in (with an opt-in feature flag initially). I don't see us adding this right away, unless someone comes in with a pull request, because

  1. It would change the way scanning works if we encounter projects from Godeps, GPM, and GB which glide can import from. While those systems have the ability to specify dependencies I've found in practice that they will include dependencies that aren't in their config files. Specifically, sub-packages. That means we'd need to implement a multi-pass system on those to accurately detect sub-packages. This is a bit of work.
  2. We are cleaning up the internals right now. We can reduce the code and complexing which will make it easier to maintain and for people to contribute to. After that we have a few highly requested features to implement. That means this is a little lower priority.

I like this idea and have marked it as an enhancement.

@technosophos
Copy link
Member

The answer to the original question is this. We have tracked the list of subpackages for the following reasons:

  • It's easy to get during a scan, but much harder to track manually
  • glide rebuild uses it to build .a files. Prior to 1.4/1.5, rebuild was very useful
  • We figured that at some point in the future we could use this information to make optimizations

@kshlm kshlm changed the title Q: What is the use of the subpackages? Vendor the dependencies of the sub-packages only, not of the parent package Dec 30, 2015
@kshlm
Copy link
Author

kshlm commented Dec 30, 2015

I changed the title of the issue to make it more relevant to the enhancement request.

@aboukirev
Copy link

My projects use context, http2, and websocket subpackages from the golang.org/x/net. Those do not require any further dependencies (project builds fine). However, glide starts pull all dependencies for the entire golang.org/x/net, which results into dozens of unused packages in vendor folder. If anything, I preferred the original glide where I would add all necessary packages to vendor manually with glide get. Now that command triggers recursion regardless.

@davecgh
Copy link

davecgh commented Dec 30, 2015

I agree with @aboukirev in regards to this issue. For example, we use ripemd160 from golang.org/x/crypto, but glide pulls all the other packages in the golang.org/x/crypto which aren't used or needed.

@mattfarina
Copy link
Member

There are two things here.

First, there are all the packages in something like golang.org/x/crypto. We can't avoid pulling down all the different crypto packages because they are part of the same repo. The go get tool fetches them all today.

Second, we could be smarter about pulling down dependencies of this project to pull down fewer. Maybe out of the box Glide only pulls down the code path in use and there's a switch (e.g., --all) that pulls down everything. That would be useful in active development when adding new imports.

@davecgh
Copy link

davecgh commented Jan 1, 2016

Right, I wasn't very clear. I understand there is no way to avoid pulling the repo and checking out the appropriate version. That is of course one of the major reasons for using glide to lock specific versions.

I was more referring to your second point. Right now, glide also pulls the dependencies of all of the packages in the repo instead of just the ones needed by the active code path. For example, consider the gokit repository. You may only want to use the circuitbreaker and ratelimit packages. However, glide also pulls the dependencies for all of the other unused subpackages too which include dependencies that are not needed. For example, the metrics/prometheus subpackage has a dependency on the prometheus client which is not needed at all for any of the other packages, but glide pulls it too.

Along the same lines, glide even pulls the dependencies only used by the tests, which, in my opinion, really should require a setting to enable (much like how go get has -t for this purpose). For example, we make use of goleveldb, which ends up needing 3 additional dependencies (ginkgo, gomega, and protobuf) that are only used in the test path. Those dependencies are not needed in order to make use of the package. The main reason for using glide is to lock your already tested and known good dependencies for reproducible builds, so I don't believe that it should be pulling down the test-only deps by default given the author has already vetted and tested them.

@mattfarina
Copy link
Member

@davecgh Glide doesn't pull the dependencies used by the tests following the Go rules for that (e.g., _test.go files). If packages are being pulled it's because they are in normal Go files like applications files. I'm starting to discover that the Go conventions for tests aren't being followed by some packages so the tests appear as normal Go files to any automation.

@davecgh
Copy link

davecgh commented Jan 1, 2016

@mattfarina: Alright, so after looking more closely you're correct that it's the result of a testutil subpackage in goleveldb that imports the unneeded deps as opposed to a _test.go file.

I made a test repository to show the difference between go get and glide. Perhaps it'll be useful for testing in the future.

$ export GOPATH=/d/tmpgo && go get -u -v github.com/davecgh/glidetest
github.com/davecgh/glidetest (download)
github.com/syndtr/goleveldb (download)
github.com/golang/snappy (download)
github.com/syndtr/goleveldb/leveldb/util
github.com/syndtr/goleveldb/leveldb/comparer
github.com/golang/snappy
github.com/syndtr/goleveldb/leveldb/cache
github.com/syndtr/goleveldb/leveldb/storage
github.com/syndtr/goleveldb/leveldb/filter
github.com/syndtr/goleveldb/leveldb/opt
github.com/syndtr/goleveldb/leveldb/errors
github.com/syndtr/goleveldb/leveldb/iterator
github.com/syndtr/goleveldb/leveldb/journal
github.com/syndtr/goleveldb/leveldb/memdb
github.com/syndtr/goleveldb/leveldb/table
github.com/syndtr/goleveldb/leveldb
github.com/davecgh/glidetest

$ tree -d $GOPATH/src
/d/tmpgo/src
└── github.com
    ├── davecgh
    │   └── glidetest
    ├── golang
    │   └── snappy
    └── syndtr
        └── goleveldb
            ├── leveldb
            │   ├── cache
            │   ├── comparer
            │   ├── errors
            │   ├── filter
            │   ├── iterator
            │   ├── journal
            │   ├── memdb
            │   ├── opt
            │   ├── storage
            │   ├── table
            │   ├── testutil
            │   └── util
            └── manualtest
                ├── dbstress
                └── filelock

23 directories

Notice how go get does not grab ginkgo, gomega, or protobuf because they are only used in a package that is not imported within glidetest. Also notice there are only 23 directories in the tree of deps.

Now, here is the same thing with glide using the glide.yaml file in the test repo. I put *** next to the lines that are dealing with packages that are not imported.

$ cd $GOPATH/src/github.com/davecgh/glidetest && glide up
[INFO] Fetching updates for github.com/syndtr/goleveldb.
[INFO] Scanning github.com/syndtr/goleveldb for dependencies.
***[INFO] ==> Unknown github.com/onsi/ginkgo (github.com/onsi/ginkgo) 
***[INFO] ==> Unknown github.com/onsi/ginkgo (github.com/onsi/ginkgo/config)
***[INFO] ==> Unknown github.com/onsi/gomega (github.com/onsi/gomega)
***[INFO] Fetching updates for github.com/onsi/ginkgo.
***[INFO] Fetching updates for github.com/onsi/gomega.
[INFO] Fetching updates for github.com/golang/snappy.
[INFO] Scanning github.com/golang/snappy for dependencies.
***[INFO] Scanning github.com/onsi/ginkgo for dependencies.
***[INFO] Scanning github.com/onsi/ginkgo for dependencies.
***[INFO] Scanning github.com/onsi/gomega for dependencies.
***[INFO] ==> Unknown github.com/golang/protobuf (github.com/golang/protobuf/proto)
***[INFO] ==> Unknown github.com/golang/protobuf (github.com/golang/protobuf/proto)
***[INFO] Fetching updates for github.com/golang/protobuf.
***[INFO] Scanning github.com/golang/protobuf for dependencies.
***[INFO] Scanning github.com/golang/protobuf for dependencies.
[INFO] Project relies on 5 dependencies.
[INFO] Writing glide.lock file

$ tree -d vendor
vendor
└── github.com
    ├── golang
    │   ├── protobuf
    │   │   ├── jsonpb
    │   │   │   └── jsonpb_test_proto
    │   │   ├── proto
    │   │   │   ├── proto3_proto
    │   │   │   └── testdata
    │   │   └── protoc-gen-go
    │   │       ├── descriptor
    │   │       ├── generator
    │   │       ├── internal
    │   │       │   └── grpc
    │   │       ├── plugin
    │   │       └── testdata
    │   │           ├── multi
    │   │           └── my_test
    │   └── snappy
    ├── onsi
    │   ├── ginkgo
    │   │   ├── config
    │   │   ├── extensions
    │   │   │   └── table
    │   │   ├── ginkgo
    │   │   │   ├── convert
    │   │   │   ├── interrupthandler
    │   │   │   ├── nodot
    │   │   │   ├── testrunner
    │   │   │   ├── testsuite
    │   │   │   └── watch
    │   │   ├── integration
    │   │   │   └── _fixtures
    │   │   │       ├── convert_fixtures
    │   │   │       │   ├── nested
    │   │   │       │   └── nested_without_gofiles
    │   │   │       │       └── subpackage
    │   │   │       ├── convert_goldmasters
    │   │   │       ├── coverage_fixture
    │   │   │       │   └── external_coverage_fixture
    │   │   │       ├── does_not_compile
    │   │   │       ├── eventually_failing
    │   │   │       ├── exiting_synchronized_setup_tests
    │   │   │       ├── fail_fixture
    │   │   │       ├── failing_after_suite
    │   │   │       ├── failing_before_suite
    │   │   │       ├── failing_ginkgo_tests
    │   │   │       ├── flags_tests
    │   │   │       ├── focused_fixture
    │   │   │       ├── hanging_suite
    │   │   │       ├── more_ginkgo_tests
    │   │   │       ├── no_tests
    │   │   │       ├── passing_ginkgo_tests
    │   │   │       ├── passing_suite_setup
    │   │   │       ├── progress_fixture
    │   │   │       ├── skip_fixture
    │   │   │       ├── suite_command_tests
    │   │   │       ├── synchronized_setup_tests
    │   │   │       ├── tags_tests
    │   │   │       ├── test_description
    │   │   │       ├── watch_fixtures
    │   │   │       │   ├── A
    │   │   │       │   ├── B
    │   │   │       │   ├── C
    │   │   │       │   └── D
    │   │   │       └── xunit_tests
    │   │   ├── internal
    │   │   │   ├── codelocation
    │   │   │   ├── containernode
    │   │   │   ├── failer
    │   │   │   ├── leafnodes
    │   │   │   ├── remote
    │   │   │   ├── spec
    │   │   │   ├── specrunner
    │   │   │   ├── suite
    │   │   │   ├── testingtproxy
    │   │   │   └── writer
    │   │   ├── reporters
    │   │   │   └── stenographer
    │   │   └── types
    │   └── gomega
    │       ├── format
    │       ├── gbytes
    │       ├── gexec
    │       │   └── _fixture
    │       │       └── firefly
    │       ├── ghttp
    │       │   └── protobuf
    │       ├── internal
    │       │   ├── assertion
    │       │   ├── asyncassertion
    │       │   ├── fakematcher
    │       │   ├── oraclematcher
    │       │   └── testingtsupport
    │       ├── matchers
    │       │   └── support
    │       │       └── goraph
    │       │           ├── bipartitegraph
    │       │           ├── edge
    │       │           ├── node
    │       │           └── util
    │       └── types
    └── syndtr
        └── goleveldb
            ├── leveldb
            │   ├── cache
            │   ├── comparer
            │   ├── errors
            │   ├── filter
            │   ├── iterator
            │   ├── journal
            │   ├── memdb
            │   ├── opt
            │   ├── storage
            │   ├── table
            │   ├── testutil
            │   └── util
            └── manualtest
                ├── dbstress
                └── filelock

119 directories

So, the end result is glide is ending up with 119 directories in the tree of deps versus 23.

@mattfarina
Copy link
Member

Yup. Glide can get smarter with the packages imported. This this a great example of the difference. Thanks.

hectorj pushed a commit to hectorj/glide that referenced this issue Jan 21, 2016
When subpackages are specified, fetch only their dependencies instead of those from the whole project
@hectorj
Copy link
Contributor

hectorj commented Jan 21, 2016

I welcome a pull request to add this in (with an opt-in feature flag initially)

Just a detail, but how would you name this flag?

And another question concerning this case: #166 (comment)

Should the new feature also kick-in in that case, or only if subpackages are explicitely listed?

@dmitris
Copy link

dmitris commented Jan 22, 2016

I added another example of a case where glide includes dependencies that are not needed by the project:
https://github.com/dmitris/deptest4glide
with a short write-up in the README. The example uses only golang.org/x/net package (with the html golang.org/x/net/html and [transitively] golang.org/x/net/net/atom subpackages) - but glide update also pulls in golang.org/x/crypto and golang.org/x/text which are packages that are not required to build the program or its dependencies.

I looked at the source of the golang.org/x/net package in search of where it uses the golang.org/x/crypto one (using grep -r -l 'golang.org/x/crypto' * in a fresh clone of https://go.googlesource.com/net) - the only case I find is in http2/h2i/h2i.go which is a package main' program (an interactive HTTP/2 console). I believe glide should not pay attention to the 'package main' imports when scanning imported packages for transitive dependencies. Similarly, the golang.org/x/text is used in the golang.org/x/net/html/charset subpackage which is not used by my example program.

To avoid such false positives, I wonder if it would be possible to (optionally) rely on go list [...] .Deps output as in the command below for the cases where you already have all the dependencies installed in your GOPATH - or at least double-check against what go list indicates?

go list -e -f '{{join .Deps "\n"}}' ./... | xargs go list -f '{{if not .Standard}}{{.ImportPath}}{{end}}'

I would be happy to help with testing or implementation if needed. I pulled a patch by @hectorj from https://github.com/hectorj/glide/tree/feature-166 and it seems to work fine for this case - it created glide.lock with only one dependency - golang.org/x/net, as expected:

hash: 7fde8e6242edcd1e55bb5f40fc1e29a66e3871e9b1ccfc835ac35268d52eb0b0
updated: 2016-01-22T13:12:32.822942006+01:00
imports:
- name: golang.org/x/net
  version: 2e9cee70ee697e0a2ef894b560dda50dec7dff58
  subpackages:
  - /html
devImports: []

practically identical to my "wish" version https://github.com/dmitris/deptest4glide/blob/master/glide.lock.want

@mattfarina
Copy link
Member

On Monday, barring any major issues being found, I'm going to merge in the feat/no-cookoo branch. This is a major internal reworking that includes a simplified dependency resolution architecture.

In the new setup the dependency package contains a resolver. That needs to be updated to that the local project resolution can do directories (ResolveLocal method) while the tree walking (ResolveAll method) walks the import path.

@dmitris
Copy link

dmitris commented Jan 22, 2016

👍 - rooting for the success of that big refactoring piece you are doing. The world needs glide as a great dependency management tool - and it seems many people are eager to contribute in different ways! 😄

@mattfarina
Copy link
Member

It took an extra day because I found a bug that should have been fixed before people started complaining on master.

Blockers to this are now removed.

@dmitris
Copy link

dmitris commented Feb 15, 2016

I believe it is now fixed (thanks @hectorj for the patch!) as mentioned in https://github.com/Masterminds/glide/releases/tag/0.9.0-rc1 Does anyone still see this being a problem? I checked it with https://github.com/dmitris/deptest4glide (README has a description of the original problem) and it looks fine there; same with several of our projects I checked. I believe we can close the issue now.

@hectorj
Copy link
Contributor

hectorj commented Feb 15, 2016

The merged fix is actually from @technosophos : #240

But yeah, this is fixed.

@dmitris
Copy link

dmitris commented Feb 16, 2016

I think I jumped up too soon... It may be partially fixed but still not working correctly. I updated my minimal case to demonstrate: github.com/dmitris/deptest4glide. This package imports github.com/dmitris/deptesthuge/deptestsmall which is a dependency-free subpackage of github.com/dmitris/deptesthuge where the root package has massive dependencies (coreos, docker). Of course both go get and glide have to check out the repo for the "root" package (github.com/dmitris/deptesthuge) but go get stops there and correctly determines that the dependencies of the deptesthuge are not needed; glide on the other hand pulls everything imported in the parent directory of the used subpackage.

Here's the go list [...] .Deps ouptut for github.com/dmitris/deptest4glide:

$ go list -e -f '{{join .Deps "\n"}}' ./... | xargs go list -f '{{if not .Standard}}{{.ImportPath}}{{end}}'
github.com/dmitris/deptesthuge/deptestsmall

(the dependency set is calculated correctly)
and here's the output of glide -debug update - which pulls docker, coreos etc. - https://gist.github.com/dmitris/dbb1fce06b4f28f67aab

The resulting glide.lock file is: https://gist.github.com/dmitris/ff6e10ffa0d2f678b12e

I'm not sure which syntax in glide.yaml should be used when you just want to import a subpackage:

import:
- package: github.com/dmitris/deptesthuge/deptestsmall

or

import:
- package: github.com/dmitris/deptesthuge
  - subpackage: deptestsmall

(@kshlm asked above as well) - I tried both but still, glide pulls unneeded packages with both configurations.

/cc @technosophos

@dmitris
Copy link

dmitris commented Feb 23, 2016

I think we should close this unless anyone else sees any further problems.

@mattfarina
Copy link
Member

@dmitris way to catch this. A fix went out in 0.9.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants