Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to go get package by revision #699

Closed
ofw opened this issue Sep 24, 2018 · 25 comments · Fixed by #1015
Closed

Failing to go get package by revision #699

ofw opened this issue Sep 24, 2018 · 25 comments · Fixed by #1015
Assignees
Labels
bug Something isn't working proxy Work to do on the module proxy
Milestone

Comments

@ofw
Copy link

ofw commented Sep 24, 2018

When I try to go get a package by its revision number then I get 500 Internal Server Error.

GOPROXY=http://127.0.0.1:3000 go get github.com/pkg/errors@c059e472caf75dbe73903f6521a20abac245b17f
go: finding github.com/pkg/errors c059e472caf75dbe73903f6521a20abac245b17f
go: finding github.com/pkg c059e472caf75dbe73903f6521a20abac245b17f
go: finding github.com c059e472caf75dbe73903f6521a20abac245b17f
go get github.com/pkg/errors@c059e472caf75dbe73903f6521a20abac245b17f: unexpected status (http://127.0.0.1:3000/github.com/pkg/errors/@v/c059e472caf75dbe73903f6521a20abac245b17f.info): 500 Internal Server Error

To Reproduce
GOPROXY=http://127.0.0.1:3000 go get github.com/pkg/errors@c059e472caf75dbe73903f6521a20abac245b17f

Expected behavior
Installation of the specified version of github.com/pkg/errors

Environment (please complete the following information):

  • OS: mac os mojave 10.14
  • Go version : go1.11 darwin/amd64
  • Olympus or proxy version : latest from github
@marpio marpio added bug Something isn't working proxy Work to do on the module proxy labels Sep 24, 2018
@marpio marpio self-assigned this Sep 24, 2018
@marpio
Copy link
Member

marpio commented Sep 24, 2018

I'll take this one

@marpio
Copy link
Member

marpio commented Sep 26, 2018

So the problem is that go mod download doesn't really work with commit hashes:

➜  ~ GO111MODULE=on go mod download -json github.com/pkg/errors@248dadf4e9068a0b3e79f02ed0a610d935de5302
go: finding github.com/pkg/errors 248dadf4e9068a0b3e79f02ed0a610d935de5302
{
        "Path": "github.com/pkg/errors",
        "Version": "v0.8.1-0.20161029093637-248dadf4e906",
        "Error": "not in cache"
}

nothing gets downloaded.
What we could do is to run go list -m

➜  ~ GO111MODULE=on go list -m -json github.com/pkg/errors@248dadf4e9068a0b3e79f02ed0a610d935de5302
{
        "Path": "github.com/pkg/errors",
        "Version": "v0.8.1-0.20161029093637-248dadf4e906",
        "Time": "2016-10-29T09:36:37Z",
        "Dir": "/home/piotr/go/pkg/mod/github.com/pkg/errors@v0.8.1-0.20161029093637-248dadf4e906",
        "GoMod": "/home/piotr/go/pkg/mod/cache/download/github.com/pkg/errors/@v/v0.8.1-0.20161029093637-248dadf4e906.mod"
}

to get the pseudo version determined by go (in this case v0.8.1-0.20161029093637-248dadf4e906) and then

➜  ~ GO111MODULE=on go mod download -json github.com/pkg/errors@v0.8.1-0.20161029093637-248dadf4e906
{
        "Path": "github.com/pkg/errors",
        "Version": "v0.8.1-0.20161029093637-248dadf4e906",
        "Info": "/home/piotr/go/pkg/mod/cache/download/github.com/pkg/errors/@v/v0.8.1-0.20161029093637-248dadf4e906.info",
        "GoMod": "/home/piotr/go/pkg/mod/cache/download/github.com/pkg/errors/@v/v0.8.1-0.20161029093637-248dadf4e906.mod",
        "Zip": "/home/piotr/go/pkg/mod/cache/download/github.com/pkg/errors/@v/v0.8.1-0.20161029093637-248dadf4e906.zip",
        "Dir": "/home/piotr/go/pkg/mod/github.com/pkg/errors@v0.8.1-0.20161029093637-248dadf4e906",
        "Sum": "h1:BKfCEBHnHoXswNe0Btj/zOfiyn40z6qOti8SeLbQgdM=",
        "GoModSum": "h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0="
}

The question is when and where should this be done? I was thinking about those 3 solutions:

  1. exec go liston every request to determine the version (in a middelware?) 248dadf4e9068a0b3e79f02ed0a610d935de5302 => v0.8.1-0.20161029093637-248dadf4e906
    v1.0.1 => v1.0.1
  2. exec go list only if the version is a commit hash (in a middelware?)
  3. inside goGetFetcher (./pkg/module/go_get_fetcher.go) execute go list first, then go mod download and then pass the version (v0.8.1-0.20161029093637-248dadf4e906) to the storage

I'm leaning toward 3. because we have the whole singleflight/pool logic in place there but I would like to hear what others think.

/cc @gomods/maintainers @gomods/contributors

@michalpristas
Copy link
Member

i would go with option 2. have a middleware intercepting calls to all endpoints except list then it needs to check whether or not it is a commit hash, if it is, it translates and redirects to correct one?

@manugupt1
Copy link
Member

I like 3rd the best because it is just go mod download that does not support hash commits. I like it because
A) It is easier to read the code
B) It does not introduce latency in other parts of the code.

@fedepaol
Copy link
Member

fedepaol commented Sep 27, 2018 via email

@michalpristas
Copy link
Member

This is the reason I suggested returning redirect.
fetcher implementation sounds also good

@fedepaol
Copy link
Member

fedepaol commented Sep 27, 2018 via email

@marwan-at-work
Copy link
Contributor

Why not open this issue on the Go repository and make mod download work with plain commits? We can have a workaround until it's resolved upstream

@marpio
Copy link
Member

marpio commented Sep 30, 2018

Thanks for the feedback!
I'll go with the middleware since it will be easier to remove it once golang/go#27947 gets fixed. Also the 3rd option would require passing the pseudo version from the go_get_fetcher/stasher, all the way up to the download protocol (to get it from the storage) which would make the code less readable.

@fedepaol
Copy link
Member

fedepaol commented Sep 30, 2018 via email

@marwan-at-work
Copy link
Contributor

marwan-at-work commented Sep 30, 2018

@marpio @fedepaol curious what you think about potential option no. 4 above? Also what's the impact of this bug?

Edit: just noticed your open issue on the Go repo. If we have a workaround we should definitely add a TODO comment to remove the workaround once go fixes this.

@marpio
Copy link
Member

marpio commented Oct 1, 2018

@fedepaol sure, it's yours!

@marpio marpio assigned fedepaol and unassigned marpio Oct 1, 2018
@fedepaol fedepaol mentioned this issue Oct 5, 2018
@marwan-at-work
Copy link
Contributor

I have a couple of concerns here (briefly mentioned in the PR):

  1. What happens when a repo is not available upstream but is available in our storage? go list will fail but Athens should still serve that module, that's the whole point of Athens.
  2. How do we save this module in storage? Do we save it as a long hash string? or do we save it as a pseudo version? How do we ensure both are retrieved? AFAIK, blob storage services do not have have references so we might need to duplicate modules. What about full hash vs partial hash?
  3. What happens when a user does a go get but not @<hash> rather: @master. Do we lock the master version forever? That would obviously be wrong.

I think we should hold off on this issue until we answer all of these questions, and maybe until mod download is fixed.

@fedepaol
Copy link
Member

fedepaol commented Oct 5, 2018

I do agree with you on all the line. When working on #744 I thought that caching the hash - pseudoversion mapping would have been an optimization.
The problem is by doing that we make athens dependant on the vcs (which is something that goes against the purpouse of athens).
Said that, I think this bug needs to be addressed. If I understood correctly the effects, it would mean that athens could not be used if the dependency does not use sem versioning / does not tags properly which is something that is likely to happen.

What we could do is to cache the mapping somewhere (and leave the middleware there) OR moving the hash - semver resolution inside the go_get_fetcher so that if a hash is asked it uses the resolved pseudoversion in place of it. This would make it work transparently until the mod download gets fixed.

I also need to take a look at how go mods behaves with master, but in any case we could put ifs that behaves differently if the version is master. Not sure how the current implementation behaves if master is asked, but I guess the problems are the same.

@marpio
Copy link
Member

marpio commented Oct 6, 2018

go get github.com/pkg/errors@248dadf4e9068a0b3e79f02ed0a610d935de5302 will add a pseudo version v0.8.1-0.20161029093637-248dadf4e906 record to the go.mod file. The next time we will build the project go will ask for the pseudo version and will find it in the storage (go list middleware won't be hit). I we ask by the hash and the repo is not available it will fail. We could create a separate storage to save the hash->pseudo_version mapping which we could check along go list. Second option would be to use the first 12 positions of the hash that are added to the pseudo version ( would it be save in a scope of a module?).

When go download gets fixed we will have another problem - with the current logic we will save the module as a long hash which is not what we want (the go.mod entry is a pseudo version) so we will probably still need to figure out how to go from hash->pseudo_version.

@fedepaol
Copy link
Member

fedepaol commented Oct 6, 2018

Again, I see two alternatives here.

  • We leave the list middleware there, and we store the hash / pseudover mapping somewhere. It can be the final solution even when go download gets fixed
  • We move the hash / pseudover resolution inside the go_get_fetcher . This means that the same version will be pointed by the hash AND by the pseudover. That would work at the price of having duplicate data pointed by two different keys.

Given what @marpio said, I am afraid we need to store that mapping somewhere whatever direction we decide to take.

@marwan-at-work
Copy link
Contributor

@marpio

go get github.com/pkg/errors@248dadf4e9068a0b3e79f02ed0a610d935de5302 will add a pseudo version v0.8.1-0.20161029093637-248dadf4e906 record to the go.mod file.

True. However, Athens will not save the version in storage under v0.8.1-0.20161029093637-248dadf4e906, it will save the version in storage under 248dadf4e9068a0b3e79f02ed0a610d935de5302 -- and therefore, both times will be fetched from upstream and storage for that module will be duplicated.

So, as you said towards the end, we need to figure out how to go from hash to pseudo before persisting, and also retrieving a pseudo from hash from storage and not just upstream.

Lastly, as I mentioned before: we need to do that not just for hashes but for branch names: master, v2 branch, v3 branch etc etc.

I think we should pause and think about this a little bit more. This will certainly add a lot complexity to make one feature work.

For example, for beta: maybe it's ok to not allow hashes/branches to be fetched. The hash/branch gogets are only done by humans so they can bypass the proxy. CI/CD systems will all work.

@marpio
Copy link
Member

marpio commented Oct 7, 2018

@marwan-at-work

True. However, Athens will not save the version in storage under v0.8.1-0.20161029093637-248dadf4e906, it will save the version in storage under 248dadf4e9068a0b3e79f02ed0a610d935de5302 -- and therefore, both times will be fetched from upstream and storage for that module will be duplicated.

I might be missing something but the the redirect middleware @fedepaol implemented should prevent that from happening, no?
It should redirect from /github.com/pkg/errors/@v/248dadf4e9068a0b3e79f02ed0a610d935de5302.zip to /github.com/pkg/errors/@v/v0.8.1-0.20161029093637-248dadf4e906.zip so that it will be stored under v0.8.1-0.20161029093637-248dadf4e906
The same should be true for master i.e. go list -m -json github.com/pkg/errors@master gives us v0.8.1-0.20180911062113-c059e472caf7
But I'm with you on that we shouldn't rush with this one and think it through.

@fedepaol
Copy link
Member

fedepaol commented Oct 7, 2018

The middleware solves the hash problem. It redirects to the pseudover so the only entry in the storage is related to the pseudoversion. What's missing is the persitence of the map between the commit hash and the vcs in order to be complete.

@marwan-at-work's looked far than us (me) and raised the problem of master / named branches which is something I think needs to be discussed carefully. When a user asks for master, wants the head of the master, that could move during the time. If we translate master to a pseudoversion, and remember that mapping, we freeze "master" forever, even if another user from another codebase comes two years later and ask the proxy to resolve "master" again, not expecting a two years old master.

For this reason, my opinion about master / branches is that they should not be persisted at all and that we should refuse the requests pointing to them (or serve them from the vcs every time). If a user wants to use athens, he needs to point specific commits (or tags) of a depndency because he wants the build to be reproducible.

@marpio
Copy link
Member

marpio commented Oct 8, 2018

@fedepaol master/branch would have to go through the middleware every time. This way it would resolve to a new pseudo version each time it's requested.
Having said that, we would have to copy a bunch of code from cmd go to know if the version is a semver/pseudo version/branch or not. Maybe it's not worth it and at least for beta it's ok to just say "if you want a hash/branch you have to bypass the proxy" like @marwan-at-work proposed. We should just carefully document that, otherwise it could be very confusing.

@fedepaol
Copy link
Member

fedepaol commented Oct 8, 2018

@marpio agree. What I mean here is that we just can't store a map between master/branch and a pseudoversion because the pseudoversion is going to change every time master/branch changes. For this reason it would make no sense to me to store it.

Do agree with let's postpone this and think about it carefully .

@arschles
Copy link
Member

For beta (v0.1.0), should we add some docs saying that go get mymod@sha123 doesn't work? If we decide to do those docs, we can move this out of v0.1.0, otherwise we should keep this in here

cc/ @marwan-at-work

@arschles
Copy link
Member

We decided in the Oct. 25 meeting that we're going to write docs explaining the limitation, and fix this for v0.2.0

@marpio
Copy link
Member

marpio commented Nov 6, 2018

The go issue for ref: golang/go#27947

@arschles
Copy link
Member

I'm going to take this out of the v0.3.0 milestone so that we don't couple it with the release of Go 1.12 (this is labelled with fix after go 1.12)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working proxy Work to do on the module proxy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants