-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-manager] observe when backup or content restore fails #10330
Comments
@jenting Pavel is out on vacation. Can you think of an alternative for this to help us observe backup and restore success and failure? |
Roger that. |
For context, this issue arised from observing this case. The main goal is to create a metric that represents data loss. Would these metrics catch the following data loss scenario?
|
Implementing a counter metrics for backup and restore success and failure should allow us to compare: |
Yes, wrong link, 1 as in your first use case (I removed the link).
I mean number list of all steps leading to the outcome you want to measure, from the initial workspace start, some amount of restarts, and then a failiure. In other words, it looked like you were combining the above 3 use cases, but I was having trouble following. |
Is your feature request related to a problem? Please describe
We lack the ability to track failures for backups and content restore.
Describe the behaviour you'd like
Add metrics so that we can observe trends with backup and content restore success and failure. Perhaps counters? Four in total:
Describe alternatives you've considered
Consult with @sagor999 or @jenting , they are working the durability epic (PVC) and may have alternative ideas.
Additional context
We added metrics to time content init and finalize via #9355
We lack metrics to track content init or finalize failures, aside from seeing at a high level that a workspace start or stop failed, without necessarily knowing if it was related to content.
The text was updated successfully, but these errors were encountered: