-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype: Implement new storage backend http
#1131
Conversation
Hey @tylerchr I saw this is still a draft PR. Are you ready for reviews on it? |
Yeah—I didn't really intend that. Clearly I don't know how Draft PRs work. It'd obviously close #1130. I suppose whether it addresses #1110 is a question for @marpio, but it seems to me that it would. I see no reason why arbitrary storage backends couldn't be pretty simply implemented as little HTTP translation layers. I haven't put a whole lot of thought into this, but deploying a couple of containers (Athens + B2 storage layer, for example) does seem a little more cloud-friendly than attempting something with package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerchr this is looking great so far 🚀 Once it's done, I feel like Athens will gain a ton of flexibility
I left a few nits and a few requests, and I left a few bigger requests below to hopefully save some code. If you'd like to do the nits in a follow-up PR I'm totally cool with that.
- I noticed that you wrote a lot of HTTP request parsing code. What do you think about using a higher level client library? gorequest comes to mind for that
- I also noticed that in tests you wrote a big switch statement to do HTTP server multiplexing. What do you think about using a multiplexer library instead?
- You do some HTML tokenization in
collectLinks
. Do you think that goquery could help you remove some of that code?
|
||
This driver stores files to an HTTP server via standard GET and PUT requests. The files are laid out in a manner identical to the proxy URL used to access them, and the requests are optionally (but hopefully!) authenticated using Basic Auth. | ||
|
||
The HTTP storage driver can be used to integrate with systems like Artifactory that offer blob storage over HTTP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you mention that this is different from other blob storage systems like S3, Azure blob storage, or Google Cloud Storage that also implement (proprietary) HTTP interfaces?
func (s *ModuleStore) connect() error { | ||
const op errors.Op = "http.connect" | ||
|
||
// I guess just GET the base URL and see if it 401's? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Maybe we should require that it returns a 204. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the following facts:
- I use the base URL for this check
- The base URL will necessarily refer to a directory
- Elsewhere,
GET
s on directories are expected to yield a directory listing
...it seems a little inconsistent to hope for 204 No Content
here. For example, my backend of interest (Artifactory) responds with 200 OK
and a directory listing.
You do make a good point though: right now it fails on status codes >= 400
but I could tighten that to >= 200 && < 300
to more clearly reflect the expectation of this check. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Then I have a more generic question - do you intend to build a generic HTTP storage driver in this PR, or one that can use Artifactory as a storage driver?
The two goals overlap, but I think there will need to be a spec for the driver (it looks like you started on that in here 😁) if you want to do the former. With the latter, I think it is fine to make everything in this PR work with Artifactory, without worrying about other possible backends.
Personally, I think it would be best to go with the latter (Artifactory-specific) since that achieves your immediate needs. Later on, we could of course always build a more generic HTTP backend with a spec and all...
// Ok, so admittedly this scheme for listing versions is a little harebrained. | ||
// But that said it _is_ pretty standard for directory indexes to be formatted | ||
// very closely to what we expect here. I'm not settled on this but it does | ||
// work surprisingly well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem harebrained to me! If this is what you want to go with, let's make sure to add it to our docs (on the hugo site) and explain some options for what you can do if your storage system doesn't spit this format out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't initially approached this from the "storage plugin" angle, but I see now that it may be inconvenient/impossible to render a directory listing for certain backends.
Perhaps List
could support two strategies for obtaining a listing, and attempt them one at a time in order:
- Fetch a listing in the documented format from
GET /<module>/@v/list
. Athens itself won't update this file or otherwise manage it, but this endpoint may be simpler/cleaner to implement for some storage plugins than a directory listing. - Scrape a version list from the directory listing at
GET /<module>/@v/
as currently implemented. This option caters to storage servers based on filesystem semantics like Artifactory orhttp.FileServer
.
This approach would make implementing custom storage plugins a bit nicer at the cost of slowing down List
for cases like mine, where the server only supports the second strategy.
…bsolute links to irrelevant .mod files
…into http-storage
@arschles I addressed most of your inline comments, some with follow-up questions. I haven't tackled your top-level comments yet. |
I tried using goquery to do the link parsing and personally didn't find it all that compelling. It essentially trades
and the
and the entirety of the |
@tylerchr regarding goquery, all good with me to go with what you have! Thanks for explaining your logic 😁 |
@tylerchr did you want to pick this up again? I'm happy to work with you to finish this up if you're up for it. I think it's really cool and would love to see it go in. |
@arschles Yes, in fact I was just getting back up to speed over the weekend. I’m thinking the best way forward is for me to implement the more generic form of HTTP support here (e.g., with the 204 you proposed earlier, etc.) with an eye toward the pluggable HTTP Storage API we’ve discussed in #1110. In the intervening weeks I’ve become more comfortable with implementing my Artifactory support externally via that new API rather than take that leaning here—having two similar-but-incompatible HTTP APIs in Athens (Artifactory HTTP + Extensible HTTP) itself doesn’t seem like the best thing for the project. With that said, what guidance do you have in the design of said HTTP plugin API? I suspect we’re fairly close here already but if it’s going to be used as an extension point I’d think you’ll want to help me get it right. Is there any discussion about it other than #1110, and if not, what would you like to be changed in this PR to achieve that? |
Awesome! Welcome back 😀
I agree. It'll be nice to have Athens storages be extensible and have Artifactory as one of the extensions
Absolutely! 😀
Looking back at the history of this PR, I agree with @marwan-at-work said in #1110 (comment) that we should use prior art if possible. What do you think about using the standard download HTTP API, plus an extension for catalogers to use? |
@tylerchr just checking in, would you be interested in continuing with this PR? If not, absolutely no worries - I'll continue on with it. |
@tylerchr similarly, I'm curious about what functionality you felt Artifactory lacked that you also wanted to deploy Athens? We use Artifactory here, and I always saw Athens as the open and free approach. |
Per my needs in #1130 I started hacking on a generic
http
backend. It started as a prototype but turned out not to be all that complex.The specific decisions I made in this implementation may or may not be ideal. The whole premise of a generic
http
backend is just an idea. I'm looking for feedback on that and assume that some serious iteration would be necessary before considering a merge.What is the problem I am trying to address?
At Qualtrics we use an artifact repository that isn't already supported by Athens. We want to deploy Athens internally. but strong corporate incentives exist to use our internal repository for storage (rather than, say, Athens' S3 support).
I could have just used its API and implemented another platform-specific storage backend, but after working through our requirements and talking with @arschles I saw an opportunity to achieve our goals with a more open-ended solution.
How is the fix applied?
What resulted was a reasonably generic
http
storage type. In general, it maps the methods ofstorage.Backend
to HTTP calls in the following manner (I'm using thegolang.org/x/net
module as an example):Lister
List
requestsGET /golang.org/x/net/@v/
and scrapes versions (actually, links to*.mod
files) from the directory listingGetter
Info
requestsGET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info
and returns the contentsGoMod
requestsGET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
and returns the contentsZip
requestsGET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip
and returns the contentsChecker
Exists
requestsGET /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
and returnsfalse
if it receives a404 Not Found
responseThe above calls are largely identical to the usual API defined in
go help goproxy
.The big exception is the handling of
List
due the different semantics used by Athens' storage interface. I went through a couple ideas here:/list
file, but that's a relatively larger change and also raises concurrency questions—I opted to just steer clear of that. Admittedly this decision does have the unfortunate consequence that the HTTP backend is not itself a compliant Go module proxy.The following extensions to the standard protocol support write operations expected by the Storage interface:
Saver
Save
usesmodule.Upload
to do the following things, potentially using BasicAuth credentials from the Athens config file:PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info
PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
PUT /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip
Deleter
Delete
usesmodule.Delete
to do the following things, again potentially using BasicAuth credentials:DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.info
DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.mod
DELETE /golang.org/x/net/@v/v0.0.0-20180724234803-3673e40ba225.zip
Mention the issue number it fixes or add the details of the changes if it doesn't have a specific issue.
Addresses #1130