-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide pressure stall information for workspaces #13703
Conversation
started the job as gitpod-build-fo-workspace-psi.12 because the annotations in the pull request description changed |
// Licensed under the GNU Affero General Public License (AGPL). | ||
// See License-AGPL.txt in the project root for license information. | ||
|
||
package v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll want to name it cgroups_v2
because otherwise you'd import it as v2.IO
which isn't very descriptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Shouldn't the other package import as common-go/cgroups/v2
and calls either v2.NewIOControllerWithMount
or v2.NewIOController
?
/werft run with-integration-tests=workspace with-large-vm=true 👍 started the job as gitpod-build-fo-workspace-psi.13 |
/werft run with-integration-tests=workspace with-large-vm=true 👍 started the job as gitpod-build-fo-workspace-psi.14 |
Running tests again, the prior job failed setup. |
@Furisto you might want to rebase with main and run another build? The integration tests are timing out after 2 hours. |
Might be due to this: https://gitpod.slack.com/archives/C032A46PWR0/p1665414537023909 |
d979b73
to
337ce0c
Compare
/werft run with-integration-tests=workspace with-large-vm=true with-clean-slate-deployment=true 👍 started the job as gitpod-build-fo-workspace-psi.16 |
/werft run with-integration-tests=workspace with-large-vm=true with-clean-slate-deployment=true 👍 started the job as gitpod-build-fo-workspace-psi.17 |
337ce0c
to
568aa5d
Compare
/werft run with-integration-tests=workspace with-large-vm=true with-clean-slate-deployment=true 👍 started the job as gitpod-build-fo-workspace-psi.23 |
Oh I misunderstood the implementation, but unfortunately this doesn't solve the problem 😕. I understood it was exposed by workspaces pod, so when prometheus scrapes workspaces, a label called What you need to do here is let go of the premise that you need to monitor every single workspace individually, at least metrics won't work for this use-case 😬. I don't have knowledge about PSI, but would it be possible to aggregate those values when exposing those metrics? |
The idea behind this is not to aggregate it because it would allow us to investigate why a workspace exhibited a certain behavior. I am struggling to understand why this would introduce so much additional load that it could break our metrics. Are we not already collecting a significant number of metrics for every pod on the cluster e.g. everything in here. In comparison to that, this PR does not introduce that much additional metrics from my point of view but I am happy to discuss this. |
Hmm I see, would it be possible to expose these metrics only for workspaces from paying customers? Thinking of self-hosted, maybe this metric could be turned on/off in the admin panel (Gitpod admins can choose teams that will have this metric exposed), so we can also choose our own customers there. The problem with the current implementation is that every single workspace will introduce a new metric and we're definitely not analyzing PSI metrics for all of them, this means that we're wasting a huge amount of resources on something that we'll never use.
Indeed, compared to container metrics that are exposed by cAdvisor and Kubelet the costs are probably the same. This is the type of metric that we're really looking forward to removing 😅, they are indeed expensive. The difference here is that we don't have much control over the kubelet nor cAdvisor metrics, while we do have control over our own metrics and we can be more conscious about them 🙂 |
Yes, that is certainly possible! I expect that this will be used more for paying customers anyway. |
Since the main purpose here is to collect workspace-specific metrics, this does not apply to ide-metrics (because in ide-metrics, we mainly collect aggregated metrics) |
3871511
to
4d1b476
Compare
Metrics are now only retrieved for workspaces of paying users. |
/unhold |
Hey @Furisto! 👋 Removed release notes as this does not affect the Gitpod end user (developer). |
@atduarte I am actually unsure why I wrote a release note 😆 As you said it does not affect the end user. |
I guess release notes are used creating monthly Changelog |
@utam0k yap, that's their sole purpose 😁 Thank you both! |
Description
Retrieves pressure stall information for workspaces. Followup to #13539 which retrieved PSI on node level.
Related Issue(s)
n.a.
How to test
kubectl port-forward ds/ws-daemon 9500
curl XGET localhost:9500/metrics
Release Notes
Werft options:
If enabled this will build
install/preview
Valid options are
all
,workspace
,webapp
,ide