-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future Design of Metrics #1469
Comments
If we want to analyze performance of a specific sled for example, would we need to record as part of the metrics which sled a disk was in (and which slot)? |
Yes, if you wanted to look at stats for physical hardware, if you don't want to look up and translate it the physical disk to a server dynamically via records (potentially historical) in the database, the server uuid a disk was in at the time would need to be recorded the same way the instance is for a crucible volume. |
There are a bunch of things in here that I appreciate us calling out. I think we'll really want a full, proper RFD on this before we redesign in earnest. |
Starting on an RFD to gather some of these considerations now. I'll link when it's ready to discuss. |
My first pass at this RFD exists here: https://rfd.shared.oxide.computer/rfd/0304 - feedback is welcome |
#1348 provides an initial implementation of metrics, but there are a couple areas where we'd like to be able to improve in future iterations. This issue documents those improvements.
Although the current design is resource-centric (to query for metrics on a disk, an endpoint filtering by org/project/disk_name is used), it may make sense to migrate to a metric-centric approach where filters can be applied. Prior art.
Route
Concretely, where the current route is:
We should consider an route like the following:
Where filters like
instance_id
and/ordisk_id
may be supplied as query parameters.An important use case is an "instance-centric flow", where a user can query for information about their particular instance. This becomes feasible by directly being able to filter on
instance_id
. This is not yet feasible today without oxidecomputer/crucible#375 , but is a worthwhile goal.Org/Project Scoping
Additionally, there's some consideration whether we'd like to add an endpoint to view metrics "globally", e.g., outside the context of an organization / project. This view may be useful for operators who which to analyze performance across a sled / rack / AZ, as opposed to a user aiming for a more instance-centric flow.
Lifetimes
It's worth considering how we'd like to enable users to query for metrics of objects that have been deleted. Use-cases like a "short-lived instance" are still valid, and have measurement information stored within Clickhouse.
If we enable "query-by-name", this is more complicated, as names may be re-used after deletion of resources. However, if we provide "query-by-ID", this seems like less of an issue.
The text was updated successfully, but these errors were encountered: