Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stat framework to Propolis #147

Merged
merged 21 commits into from
Jul 12, 2022
Merged

Add stat framework to Propolis #147

merged 21 commits into from
Jul 12, 2022

Conversation

leftwo
Copy link
Contributor

@leftwo leftwo commented Jun 23, 2022

This is an initial pass at putting some Oximeter stats into propolis, dragging Crucible along for the ride.
Feedback welcome about this strategy for metrics, and how might we improve it.

On instance create, a ProducerRegistry can be created that can be passed to libraries that wish to record stats. This allows Propolis to not have to know what stats a library wishes to register.

A new struct, InstanceMetrics, has been added to Context. This is used for tracking stats for an instance as well as containing the ProducerRegistry that libraries can use to register with Oximeter/Nexus for their own stats.

A new optional metrics field in InstanceEnsureRequest that, if is is present, tells propolis server if it should setup metrics at the provided registration address. The propolis CLI also has this same optional argument.

The CrucibleBackend Volume::construct() now takes a ProducerRegistry that if present can be
used to register metrics with Oximeter.

This requires changes in both crucible and omicron, and will have to be merged accordingly.

In this PR, we are creating a standalone metrics endpoint different from the propolis API.

Things that need completion or additional issues before we can merge this:

  • Decision on what to do if ensure_instance can't contact Nexus/Oximeter for registration.
  • Use DNS to determine registration address (updated this to just expect it from sled-agent).
  • Required changes in Crucible merged to main
  • More comments on how other libraries can add metrics.
  • Removal of the cargo patch path to crucible, update crucible git rev.

@leftwo
Copy link
Contributor Author

leftwo commented Jun 23, 2022

This won't build in CI because it still has a custom path to Crucible.

@leftwo
Copy link
Contributor Author

leftwo commented Jun 23, 2022

From hypervisor huddle:
Would it be possible to make this an optional dependency?

How are the metrics that are recorded being stored in the database? Specifically, would it be "easy" for someone who wanted to look at IO over time (where perhaps there were several power cycles) to be able to read out data and construct a graph. The concern here is to be sure that we don't store our data in the database in a way that causes a common use of the data to be difficult to extract.

@bnaecker
Copy link
Contributor

How are the metrics that are recorded being stored in the database?

This is covered in RFD 161. It's a bit complicated, since there are tables for tracking different parts of the timeseries (e.g., fields versus measurements). But the raw data is all stored contiguously in one of the measurement tables. For example, each measurement of the Nexus latency histograms, for one endpoint and response code, are stored ordered by time.

Specifically, would it be "easy" for someone who wanted to look at IO over time (where perhaps there were several power cycles) to be able to read out data and construct a graph. The concern here is to be sure that we don't store our data in the database in a way that causes a common use of the data to be difficult to extract.

Easy is appropriately in scare-quotes there. The short answer is that you can get the raw data now, but there are no tools for anything beyond that. That work has just been scheduled behind other stuff.

You've seen the oxdb tool -- that's basically a CLI to the existing querying infrastructure in oximeter_db::Client. One selects a timeseries (named as target:metric), and then an optional time range and optional sequence of filters on the field values. You'll get back zero or more timeseries, with each timestamped sample. All the processing (filtering, aggregating, etc) would have to be done by the client.

@leftwo leftwo marked this pull request as ready for review June 29, 2022 19:28
@leftwo
Copy link
Contributor Author

leftwo commented Jun 29, 2022

If the DNS names are working for Oximeter, I'll add that in place of the hard coded address that is here now, but other than that, I think the general shape of this is ready for any comments.
The crucible side still needs to land and once that does I'll update this PR

@leftwo leftwo requested review from luqmana and pfmooney June 29, 2022 19:30
@leftwo leftwo changed the title WIP: Add stat framework to Propolis Add stat framework to Propolis Jun 29, 2022
server/src/lib/stats.rs Outdated Show resolved Hide resolved
client/src/api.rs Outdated Show resolved Hide resolved
@@ -21,7 +21,8 @@ viona_api = { path = "../viona-api" }
usdt = { version = "0.3.2", default-features = false }
tokio = { version = "1", features = ["full"] }
futures = "0.3"
crucible = { git = "https://github.com/oxidecomputer/crucible", rev = "fed3e8ca7762130ee146fc516a4ef6eed2b91629", optional = true }
crucible = { git = "https://github.com/oxidecomputer/crucible", branch = "producemetric", optional = true }
oximeter = { git = "https://github.com/oxidecomputer/omicron", branch = "main", optional = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I'm not misreading: it looks like oximeter is only optional if crucible is disabled; otherwise the Crucible module will be missing the ProducerRegistry type. Is that right? (I'm trying to reason through how we might keep this module optional when we extend metrics collection to more than just Crucible.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not 100% sure, but my intent here was trying to allow propolis standalone to build and not have to pull in both crucible and then oximeter.

Crucible needs to know about ProducerRegistry, and now propolis-server will also need to know about it, even if Crucible is not present.

server/src/lib/stats.rs Outdated Show resolved Hide resolved
server/src/lib/stats.rs Outdated Show resolved Hide resolved
server/src/lib/server.rs Outdated Show resolved Hide resolved
server/src/lib/server.rs Outdated Show resolved Hide resolved
propolis/src/block/crucible.rs Outdated Show resolved Hide resolved
Alan Hanson added 2 commits July 1, 2022 20:22
Control of metric server is now determined at propolis-server start
time, though the actual metrics endpoint won't start until the
instance itself is started.  An optional metric_addr arg if provided
will indicate that a metrics endpoint will later be started.

Remove metric field from InstanceProperties
Updated smf manifest to support metric_addr config option.
No need to wrap things in Arc Mutex, and I don't have to keep track
of producer_register outside of instance_ensure().
Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from my end--thanks for the changes!

@leftwo
Copy link
Contributor Author

leftwo commented Jul 2, 2022

This now points to the latest crucible rev in main, where all required changes are.

@@ -27,3 +27,4 @@ slog-term = "2.7"
default = ["dtrace-probes"]
dtrace-probes = ["propolis/dtrace-probes"]
crucible = ["propolis/crucible"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the crucible backend is using oximeter, shouldn't the feature list propolis/oximeter itself here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 24 to 25
crucible = { git = "https://github.com/oxidecomputer/crucible", rev = "2add0de8489f1d4de901bfe98fc28b0a6efcc3ea", optional = true }
oximeter = { git = "https://github.com/oxidecomputer/omicron", branch = "main", optional = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are getting long, maybe define them as [dependencies.crucible] style sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@leftwo
Copy link
Contributor Author

leftwo commented Jul 12, 2022

@gjcolombo I merged with your changes like we discussed in chat. Let me know if that looks good.

I tested these changes and stats are still produced and oximeter collects them.

@gjcolombo
Copy link
Contributor

Looks good to me!

@leftwo leftwo merged commit 13e23ff into master Jul 12, 2022
@pfmooney pfmooney deleted the metrics branch May 9, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants