-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenVSX proxy component #6007
Conversation
cd6f787
to
be18703
Compare
I tried and it seems to work good, code changes make sense. It would be good to see how it should be hooked up with metrics and then start gradual deployment in the production. |
Codecov Report
@@ Coverage Diff @@
## main #6007 +/- ##
===========================================
+ Coverage 19.04% 45.81% +26.77%
===========================================
Files 2 8 +6
Lines 168 550 +382
===========================================
+ Hits 32 252 +220
- Misses 134 261 +127
- Partials 2 37 +35
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
d8b21cd
to
7a7e49a
Compare
I see we have a counter for cache hits, one for cache misses and another for total. Do you think it makes sense to remove hits OR misses? From my understanding, we can calculate one from the other 2 metrics. Eg: hits = total - misses, or misses = total - hits. By removing one, there is less code we need to maintain, and less storage required for extra metrics
I see a couple of histogram metrics are being added, that's very handy! One thing I'd ask here is just to get more aligned with metric naming standards. OpenMetrics, which specify good standards for prometheus metrics, asks to add the unit to the metric name. So for example, the metric
Looking at this metric,
If you add this snipped to your PR description, you'll get a grafana instance with your preview environment and you should be able to see metrics there:
The one thing that might not work here is because this PR is introducing a brand new component. Since it is a new target for prometheus, we probably need to create a new branch in gitpod-io/observability with the extra target configuration. Once we create that branch on the observability repository, we can add this snippet to create our preview with a custom branch:
I'd be happy to help, but first I need a review in this PR: #5843 😬 |
Yes, indeed. I'm always a big advocate of using existing solutions before building something ourselves. I had a look at different existing caching proxies but with the experience, we have made with the Caddy cache plugin we have come to the conclusion that we want to have more control over how the caching for OpenVSX works. E.g. the “grace mode” of varnish comes somehow close to what we need but still, it's more about what the upstream server thinks what should be cached for how long, and how long a cache should be used as a fallback. However, we have another focus:
Since we use an existing OOTB caching solution and the go reverse proxy the only logic we had to implement is to store the responses in the cache, when we want to serve the responses from the cache, and what should be replaced in the response body. That's what we would have to teach OOTB proxies as well somehow which usually requires extensive configuration and/or custom plugins. A good portion of the go code in this PR is actually to provide proper metrics and logs that we are able to define because we have more control over it. And for the actual LFU cache, we actually use an OOTB solution and don't implement them ourselves. That's why I still think that trying to teach existing proxies what we want to achieve it's harder and/or brings some limitions. |
db03de4
to
511cae5
Compare
Thanks, Arthur for your input!
Actually, we don't have an explicit counter for total, only one for hit and one for miss. The histogram metrics bring total counters out of the box but I cannot prevent it.
Thanks for the pointer! Changed this! ✔️
Interesting idea. However, I don't think that this makes much sense or brings more insights. For me, the method and the path belong together somehow. |
9e3114d
to
ccf3822
Compare
Is it possible to run this against the preview and see how metrics are shown in Grafana? :) |
ccf3822
to
e43d256
Compare
@akosyakov Yesterday, I implemented the changes we discussed with Chris. The OpenVSX proxy is now a StatefulSet with persisted volume claim. I also added Grafana to the preview environment with the help of Arthur: https://grafana-clu-openvsx-proxy-comp.preview.gitpod-dev.com/ Today, I would like to create dashboards and alerts in Grafana. |
Besides #6007 (comment), it works well in prev env. |
e43d256
to
79aaf76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: e92cc92dbfcb3717be716a0c03990e71cf67cd61
|
/assign @svenefftinge |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akosyakov, JanKoehnlein Associated issue: #5881 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
This PR removes the OpenVSX proxy implementation from the Caddy server and adds its own dedicated OpenVSX proxy component under IDE team responsibility.
With this change, we don't use Caddy for this anymore. That has the following advantages:
This change still uses the proxy to route the traffic from open-vsx.gitpod.io. This is useful for self-hosted set-ups. For our SaaS solution, we would probably configure our load balancer to route the traffic directly to the OpenVSX proxy and not over the ingress proxy.
Stress Testing
I tested the OpenVSX proxy with a script from an external server that runs 100'000 unique calls (200 in parallel) to this proxy. There were 35 out of 100'000 failed calls (closed client connection). Most of the calls took not longer than 250ms (78'311 of 100'000), 94'722 of 100'000 took not longer than 500ms. None of them took longer than 5 secs. The upstream call is by far the longest part.
bash script
detailed results
Metrics
@ArthurSens Could you have a look at the Prometheus metrics? Is that feasible? Do you have any suggestions for improvement? The number of possible
path
values ingitpod_openvsx_proxy_requests_total
should be stable and not more then what you see in this example:example Pometheus metrics
Could you also help me to add this to Grafana and create alerts?
SaaS Deployment
values.yaml
files (follow-up PR in the ops repo).Related Issue(s)
#5881
How to test
Release Notes
Meta
/uncc
/werft with-observability
/werft withObservabilityBranch=clu/openvsx-proxy