Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the proxy redirect to olympus on cache misses #349

Closed
arschles opened this issue Jul 28, 2018 · 19 comments
Closed

Make the proxy redirect to olympus on cache misses #349

arschles opened this issue Jul 28, 2018 · 19 comments
Labels
proxy Work to do on the module proxy

Comments

@arschles
Copy link
Member

If the proxy does not have module@version on its exclude list and not on its private module list, and it doesn't have it in its cache, it should redirect the go CLI to olympus

Continuation of #241

@arschles arschles added registry proxy Work to do on the module proxy labels Jul 28, 2018
@rohancme
Copy link
Contributor

Mind if I take a stab at this? Or is there already some work underway to get this working?

@ghost
Copy link

ghost commented Jul 31, 2018

Afaik it is up for grabs. The private list is in development still. #309 so that can either be added before merge or in another PR depending on which finishes first.

@arschles
Copy link
Member Author

arschles commented Jul 31, 2018 via email

@marwan-at-work
Copy link
Contributor

@rchakra3
Off the top of my head, this should be the behavior:

  1. Proxy gets a request (on any of the download protocol endpoints)
  2. Proxy checks blacklist (returns status forbidden if module is flagged)
  3. Proxy checks its own storage first, if it exists return it, we're done here.
  4. If module does not exist in Proxy Storage, then:
    4a. redirect to Olympus (and asynchronously fill proxy cache)
    4b. Or, get from storage, fill your cache, and return to client results. (sync, slow)
  5. Olympus checks if the version is Pseudo (sha based) or a valid tag, return 404 if Pseudo or return 404 if it's tag but Olympus can't find it.
  6. If Olympus returned 404, then lets do the look up ourselves (it must be a private repo or pseudo)

Things to consider:

  1. Having an option to not store Olympus modules in Proxy modules. But still store private modules.
  2. Having an option to not store Sha-based-modules in Proxy but tagged ones are okay. Proxy can still fetch Sha-based-modules and return them, that's cool. But just not store them because if you're in a big company with many teams constantly pushing a lot of commits with no tags for a while, your bills are gonna be high.

CC: @arschles @robjloranger @michalpristas @marpio (let me know if this hits the right architecture/business-requirements/etc)

@marpio
Copy link
Member

marpio commented Jul 31, 2018

There will be a configurable list of private modules #309 which should be checked as well together with the blacklist.

@marpio
Copy link
Member

marpio commented Jul 31, 2018

Would it make sense for Olympus to redirect go once again directly to the CDN if the module is cached there?

@marwan-at-work
Copy link
Contributor

@marpio I keep forgetting, what's the difference between Olympus and CDN? Is CDN the backend storage for Olympus?

@marpio
Copy link
Member

marpio commented Jul 31, 2018

@marwan-at-work yes, that is my understanding. CDN exposes all the endpoints required be download protocol though.
@arschles @michalpristas please correct me if I'm wrong.

@marwan-at-work
Copy link
Contributor

@marpio how can we make say GCP storage expose DP endpoints? I might be not getting this :)

@marpio
Copy link
Member

marpio commented Jul 31, 2018

@marwan-at-work
Copy link
Contributor

doesn't that make the CDN exactly the same as Olympus?

@marpio
Copy link
Member

marpio commented Jul 31, 2018

I guess if the module@version isn't in the CDN then Olympus would have to fetch it and save it there.

@marwan-at-work
Copy link
Contributor

I think it still makes Olympus/CDN exactly the same, at least the way I see it. In any case, this issue is regarding redirecting from Proxy -> Olympus which is already a considerable amount of work. So we can safely put Olympus -> CDN stuff to the side. But would like to discuss it more separately
Thanks!

@michalpristas
Copy link
Member

Olympus is storing its data into storage and CDN should work on top of that automatically if configured.
Flow makes sense, we need to keep in mind option to disable communication with olympus entirely. we have an issue for this as well.

@rohancme
Copy link
Contributor

rohancme commented Aug 5, 2018

Totally underestimated my workload this past week. I'm going to get started on this now.

Based on the expected behavior described by @marwan-at-work above, the additional behavior to be implemented for this issue is:

2. Proxy checks blacklist (returns status forbidden if module is flagged)

and once #309 is done also incorporate the private modules filter

and

4. If module does not exist in Proxy Storage, then:
4a. redirect to Olympus (and asynchronously fill proxy cache)
4b. Or, get from storage, fill your cache, and return to client results. (sync, slow)

4b seems to be the current behavior (I'm assuming by storage you meant the goget fetcher but I might be wrong?).

From reading through the current code it looks like this will involve registering Handlers for the proxy that (in the case of the module not existing in the proxy cache/storage):

  1. Return an HTTP 301 redirect to the appropriate olympus path
  2. Adds work items to the Olympus Fetcher and Reporter Workers so we have the module in storage/cache in the future

At first glance it would seem we'd probably not use code in pkg/download as the download protocol but there might be a way to implement an upstream protocol that just redirects and plug that in as well. I'll have to spend some more time to figure out the exact implementation here. Open to any ideas people may already have!

Just wanted to make sure all of that makes sense?

@michalpristas
Copy link
Member

Check #406 to see how upstream download protocol is passed (not directly solved in this PR but it's visible there)
Right now upstream in proxy is set to goget:

p := download.New(gg, s)

with cache filled synchronously. same goes for olympus.
what needs to be done is to make proxy talk to olympus. in few steps:

  1. On missing module on proxy, redirect to olympus, not VCS fallback
    1. Schedule a job on a Proxy to be triggered in T+X (X configurable) to pull from olympus.
  2. On Olympus check storage
    1. Has module -> serve
    2. Cache miss: GoGet as a fallback with synchronos cache fill. Serve filled data
  3. Client now has package from O.
  4. T+X: job is triggered to pull from O, next request will be served from Proxy

We might think of two different strategies in step 1:
Private repo: fallback to VCS (no Olympus interaction)
!Private repo: redirect to Olympus

But that's just me speculating and will be most likely done in a followup PR

@michalpristas
Copy link
Member

one more option would be to use Olympus download protocol as an upstream for Proxy and do everything Sync
I'm not sure about the perf in this case, whether or not will go client timeouts or not. I'm more in favor sync/async combo but folks might see it differently

@rohancme
Copy link
Contributor

rohancme commented Aug 5, 2018

Waiting on a discussion of #416 before settling on an implementation for now, but just wanted to throw some ideas out there to discuss.

The long-term implementation will need some logic to decide where to asynchronously download from (VCS or Olympus) and what tags to store in the proxy cache (everything or just tagged releases like we discussed on the dev call last Thursday)

It might make sense to actually have an implementation in place that adds items to the proxy CacheMiss work queue if modules don't exist in storage and the worker on the proxy could just do noops initially (essentially resulting in redirects every time).
The remaining functionality (exclude lists/private repos/populating the proxy cache) would be a function of fleshing out the worker logic

@arschles
Copy link
Member Author

As of #772, we're not going to try and build a registry for the time being, so I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proxy Work to do on the module proxy
Projects
None yet
Development

No branches or pull requests

5 participants