Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics] Observability requirements for Hosted Che #13270

Closed
4 of 11 tasks
l0rd opened this issue May 2, 2019 · 7 comments
Closed
4 of 11 tasks

[Metrics] Observability requirements for Hosted Che #13270

l0rd opened this issue May 2, 2019 · 7 comments
Labels
area/che-server kind/enhancement A feature request - must adhere to the feature request template. kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system.

Comments

@l0rd
Copy link
Contributor

l0rd commented May 2, 2019

Description

There have been some discussions about metrics we would like to be able to observe on hosted Che (e.g. che.openshift.io) based on the issues we had in the past and these are the most relevant ones:

Che Server metrics

  • The % of workspaces started successfully
  • The % of workspaces stopped successfully
  • The % of workspaces started in under N seconds

Workspace metrics

cc @skabashnyuk @ibuziuk

@l0rd l0rd added kind/enhancement A feature request - must adhere to the feature request template. team/platform labels May 2, 2019
@yarivlifchuk yarivlifchuk mentioned this issue May 2, 2019
21 tasks
@l0rd
Copy link
Contributor Author

l0rd commented May 2, 2019

I have added this issue to epic #10329

@ibuziuk
Copy link
Member

ibuziuk commented May 2, 2019

I believe the first step would still be adding what we already have available on dsaas - redhat-developer/rh-che#1336

@l0rd
Copy link
Contributor Author

l0rd commented May 2, 2019

@ibuziuk are you talking about the monitoring infrastructure (prometheus, grafana etc...)? This issue is about implementing the prometheus endpoints.

@ibuziuk
Copy link
Member

ibuziuk commented May 2, 2019

@l0rd Of course, this is not a blocker for implementing those endpoints upstream, but until we have a proper setup in downstream we will not be able to take full advantage of those metrics for Hosted Che

@skabashnyuk skabashnyuk added team/ide2 severity/P1 Has a major impact to usage or development of the system. labels May 15, 2019
@mkuznyetsov
Copy link
Contributor

mkuznyetsov commented May 15, 2019

we already have in Grafana "Workspace Detailed" section with heatmaps, which could be used for getting "The % of workspaces started in under N seconds" (if we want to do exactly that, we have metric endpoints specifically for it)
Screenshot from 2019-04-18 10-52-09

we also have "The % of workspaces started successfully" displayed on Grafana, yet we don't have the "The % of workspaces stopped successfully"

@aditya-konarde
Copy link

/cc @skryzhny can you please provide some feedback on the metrics and provide suggestions here?

Beyond the classic USE and RED metrics, we can look at some application specific metrics other than the current ones that either:

  • Provide business insights to someone running Eclipse Che (number of workspaces, number of users, number of signups)
  • Provide feedback into development (Workspace start time, average runtime of a workspace, workspace aggregate errors)

I believe we're already well covered looking at the dashboard. But may have scope fro some more :)

@ibuziuk ibuziuk changed the title [Metrics] Observability requirements for hosted Che [Metrics] Observability requirements for Hosted Che Jun 23, 2020
@che-bot
Copy link
Contributor

che-bot commented Feb 4, 2021

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2021
@skabashnyuk skabashnyuk added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-server kind/enhancement A feature request - must adhere to the feature request template. kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

9 participants