Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Implement new storage backend http #1131

Closed
wants to merge 13 commits into from

Conversation

tylerchr
Copy link

Per my needs in #1130 I started hacking on a generic http backend. It started as a prototype but turned out not to be all that complex.

The specific decisions I made in this implementation may or may not be ideal. The whole premise of a generic http backend is just an idea. I'm looking for feedback on that and assume that some serious iteration would be necessary before considering a merge.

What is the problem I am trying to address?

At Qualtrics we use an artifact repository that isn't already supported by Athens. We want to deploy Athens internally. but strong corporate incentives exist to use our internal repository for storage (rather than, say, Athens' S3 support).

I could have just used its API and implemented another platform-specific storage backend, but after working through our requirements and talking with @arschles I saw an opportunity to achieve our goals with a more open-ended solution.

How is the fix applied?

What resulted was a reasonably generic http storage type. In general, it maps the methods of storage.Backend to HTTP calls in the following manner (I'm using the golang.org/x/net module as an example):

  • Lister
    • List requests GET /golang.org/x/net/@v/ and scrapes versions (actually, links to *.mod files) from the directory listing
  • Getter
    • Info requests GET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info and returns the contents
    • GoMod requests GET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod and returns the contents
    • Zip requests GET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip and returns the contents
  • Checker
    • Exists requests GET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod and returns false if it receives a 404 Not Found response

The above calls are largely identical to the usual API defined in go help goproxy.

The big exception is the handling of List due the different semantics used by Athens' storage interface. I went through a couple ideas here:

  • Perhaps Athens itself could manage a real /list file, but that's a relatively larger change and also raises concurrency questions—I opted to just steer clear of that. Admittedly this decision does have the unfortunate consequence that the HTTP backend is not itself a compliant Go module proxy.
  • Since raw file servers traditionally show full directory listings when no index file exists, I thought perhaps this directory could be parsed to collect all the available versions. It seemed a little silly at first but in practice it turned out to be pretty concise and straightforward, so I went for this. I found that directory listing support is pretty universal among file servers (and easy to implement too).

The following extensions to the standard protocol support write operations expected by the Storage interface:

  • Saver
    • Save uses module.Upload to do the following things, potentially using BasicAuth credentials from the Athens config file:
      • PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info
      • PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
      • PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip
  • Deleter
    • Delete uses module.Delete to do the following things, again potentially using BasicAuth credentials:
      • DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info
      • DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
      • DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip

Mention the issue number it fixes or add the details of the changes if it doesn't have a specific issue.

Addresses #1130

@arschles
Copy link
Member

Hey @tylerchr I saw this is still a draft PR. Are you ready for reviews on it?

@arschles
Copy link
Member

Also @tylerchr would this close both #1110 and #1130?

@tylerchr tylerchr marked this pull request as ready for review March 19, 2019 05:51
@tylerchr tylerchr requested a review from a team as a code owner March 19, 2019 05:51
@tylerchr
Copy link
Author

Yeah—I didn't really intend that. Clearly I don't know how Draft PRs work.

It'd obviously close #1130. I suppose whether it addresses #1110 is a question for @marpio, but it seems to me that it would. I see no reason why arbitrary storage backends couldn't be pretty simply implemented as little HTTP translation layers.

I haven't put a whole lot of thought into this, but deploying a couple of containers (Athens + B2 storage layer, for example) does seem a little more cloud-friendly than attempting something with package plugin, what with pods and task groups and the like. Plus, as I think you pointed out elsewhere, there's the extra flexibility of being able to run them from separate places on the network.

@marpio
Copy link
Member

marpio commented Mar 19, 2019

@tylerchr I like the http backend idea very much. It definitely has advantages to the plugin package. so yeah that would close #1110

@arschles
Copy link
Member

Sounds like we agree that this will close #1110, that's great to hear!

@tylerchr I'll review this now 😁

Copy link
Member

@arschles arschles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerchr this is looking great so far 🚀 Once it's done, I feel like Athens will gain a ton of flexibility

I left a few nits and a few requests, and I left a few bigger requests below to hopefully save some code. If you'd like to do the nits in a follow-up PR I'm totally cool with that.

  • I noticed that you wrote a lot of HTTP request parsing code. What do you think about using a higher level client library? gorequest comes to mind for that
  • I also noticed that in tests you wrote a big switch statement to do HTTP server multiplexing. What do you think about using a multiplexer library instead?
  • You do some HTML tokenization in collectLinks. Do you think that goquery could help you remove some of that code?

config.dev.toml Outdated Show resolved Hide resolved

This driver stores files to an HTTP server via standard GET and PUT requests. The files are laid out in a manner identical to the proxy URL used to access them, and the requests are optionally (but hopefully!) authenticated using Basic Auth.

The HTTP storage driver can be used to integrate with systems like Artifactory that offer blob storage over HTTP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you mention that this is different from other blob storage systems like S3, Azure blob storage, or Google Cloud Storage that also implement (proprietary) HTTP interfaces?

pkg/storage/http/checker.go Outdated Show resolved Hide resolved
func (s *ModuleStore) connect() error {
const op errors.Op = "http.connect"

// I guess just GET the base URL and see if it 401's?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! Maybe we should require that it returns a 204. What do you think?

Copy link
Author

@tylerchr tylerchr Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the following facts:

  • I use the base URL for this check
  • The base URL will necessarily refer to a directory
  • Elsewhere, GETs on directories are expected to yield a directory listing

...it seems a little inconsistent to hope for 204 No Content here. For example, my backend of interest (Artifactory) responds with 200 OK and a directory listing.

You do make a good point though: right now it fails on status codes >= 400 but I could tighten that to >= 200 && < 300 to more clearly reflect the expectation of this check. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Then I have a more generic question - do you intend to build a generic HTTP storage driver in this PR, or one that can use Artifactory as a storage driver?

The two goals overlap, but I think there will need to be a spec for the driver (it looks like you started on that in here 😁) if you want to do the former. With the latter, I think it is fine to make everything in this PR work with Artifactory, without worrying about other possible backends.

Personally, I think it would be best to go with the latter (Artifactory-specific) since that achieves your immediate needs. Later on, we could of course always build a more generic HTTP backend with a spec and all...

pkg/storage/http/lister.go Outdated Show resolved Hide resolved
// Ok, so admittedly this scheme for listing versions is a little harebrained.
// But that said it _is_ pretty standard for directory indexes to be formatted
// very closely to what we expect here. I'm not settled on this but it does
// work surprisingly well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem harebrained to me! If this is what you want to go with, let's make sure to add it to our docs (on the hugo site) and explain some options for what you can do if your storage system doesn't spit this format out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't initially approached this from the "storage plugin" angle, but I see now that it may be inconvenient/impossible to render a directory listing for certain backends.

Perhaps List could support two strategies for obtaining a listing, and attempt them one at a time in order:

  1. Fetch a listing in the documented format from GET /<module>/@v/list. Athens itself won't update this file or otherwise manage it, but this endpoint may be simpler/cleaner to implement for some storage plugins than a directory listing.
  2. Scrape a version list from the directory listing at GET /<module>/@v/ as currently implemented. This option caters to storage servers based on filesystem semantics like Artifactory or http.FileServer.

This approach would make implementing custom storage plugins a bit nicer at the cost of slowing down List for cases like mine, where the server only supports the second strategy.

@tylerchr
Copy link
Author

@arschles I addressed most of your inline comments, some with follow-up questions. I haven't tackled your top-level comments yet.

@tylerchr
Copy link
Author

I tried using goquery to do the link parsing and personally didn't find it all that compelling. It essentially trades

mods, err := collectLinks(resp.Body, func(ref string) bool {
	return strings.HasPrefix(absolute(baseURL, href), baseURL) && strings.HasSuffix(href, ".mod")
})

// convert mod file references to versions
versions := make([]string, len(mods))
for i, m := range mods {
	versions[i] = strings.TrimSuffix(path.Base(m), ".mod")
}

and the collectLinks and getAttrs helpers (47 lines) for

var versions []string
doc.Find("a").Each(func(i int, s *goquery.Selection) {
	href, ok := s.Attr("href")
	if !ok {
		return
	}
	if !strings.HasPrefix(absolute(baseURL, href), baseURL) || !strings.HasSuffix(href, ".mod") {
		return
	}
	versions = append(versions, strings.TrimSuffix(path.Base(href), ".mod"))
})

and the entirety of the goquery library, which is a dependency of nontrivial size. It's not the sort of tradeoff I'd accept in my own code, but I'm happy to do that here if it's what you'd prefer.

@arschles
Copy link
Member

@tylerchr regarding goquery, all good with me to go with what you have! Thanks for explaining your logic 😁

@arschles
Copy link
Member

@tylerchr did you want to pick this up again? I'm happy to work with you to finish this up if you're up for it. I think it's really cool and would love to see it go in.

@tylerchr
Copy link
Author

@arschles Yes, in fact I was just getting back up to speed over the weekend. I’m thinking the best way forward is for me to implement the more generic form of HTTP support here (e.g., with the 204 you proposed earlier, etc.) with an eye toward the pluggable HTTP Storage API we’ve discussed in #1110.

In the intervening weeks I’ve become more comfortable with implementing my Artifactory support externally via that new API rather than take that leaning here—having two similar-but-incompatible HTTP APIs in Athens (Artifactory HTTP + Extensible HTTP) itself doesn’t seem like the best thing for the project.

With that said, what guidance do you have in the design of said HTTP plugin API? I suspect we’re fairly close here already but if it’s going to be used as an extension point I’d think you’ll want to help me get it right. Is there any discussion about it other than #1110, and if not, what would you like to be changed in this PR to achieve that?

@arschles
Copy link
Member

@tylerchr

Yes, in fact I was just getting back up to speed over the weekend.

Awesome! Welcome back 😀

having two similar-but-incompatible HTTP APIs in Athens (Artifactory HTTP + Extensible HTTP) itself doesn’t seem like the best thing for the project.

I agree. It'll be nice to have Athens storages be extensible and have Artifactory as one of the extensions

I suspect we’re fairly close here already

Absolutely! 😀

what guidance do you have in the design of said HTTP plugin API?

Looking back at the history of this PR, I agree with @marwan-at-work said in #1110 (comment) that we should use prior art if possible.

What do you think about using the standard download HTTP API, plus an extension for catalogers to use?

@arschles
Copy link
Member

@tylerchr just checking in, would you be interested in continuing with this PR? If not, absolutely no worries - I'll continue on with it.

@jpreese
Copy link
Contributor

jpreese commented Aug 23, 2019

@tylerchr similarly, I'm curious about what functionality you felt Artifactory lacked that you also wanted to deploy Athens? We use Artifactory here, and I always saw Athens as the open and free approach.

@arschles
Copy link
Member

@jpreese @tylerchr are either of you interested in continuing this conversation? We have had conversations about HTTP backends in #1488.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants