Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add disk metrics endpoint #1348

Merged
merged 21 commits into from
Jul 26, 2022
Merged

add disk metrics endpoint #1348

merged 21 commits into from
Jul 26, 2022

Conversation

iliana
Copy link
Contributor

@iliana iliana commented Jul 2, 2022

Co-authored-by: Sean Klein sean@oxide.computer

part of #1131
requires oxidecomputer/propolis#147 #1347
cc @leftwo

This adds the /organizations/{organization_name}/projects/{project_name}/disks/{disk_name}/metrics/{metric_name} endpoint, where {metric_name} can be one of six metrics that I pulled out of Crucible upstairs.

My goal was to make a generic interface over metrics related to a particular resource, then use that interface to make this endpoint work, to make it trivial to add further resource-related metrics later on.

Before merge:

@@ -6226,6 +6332,194 @@
}
},
"schemas": {
"BinRangedouble": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes out perfectly reasonable from the TS generator. Where does the janky name come from though?

export type BinRangedouble =
  | { end: number; type: 'range_to' }
  | { end: number; start: number; type: 'range' }
  | { start: number; type: 'range_from' }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a generic: BinRange<T>. BinRange<f64> -> BinRangedouble apparently.

Copy link
Contributor

@zephraph zephraph Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// A type storing a range over `T`.
///
/// This type supports ranges similar to the `RangeTo`, `Range` and `RangeFrom` types in the
/// standard library. Those cover `(..end)`, `(start..end)`, and `(start..)` respectively.
#[derive(Debug, Clone, Copy, PartialEq, Deserialize, Serialize, JsonSchema)]
#[schemars(rename = "BinRange{T}")]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum BinRange<T> {

The name comes from the schemars rename happening here. It's weird that the capitalization is wrong though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #1455

@@ -6195,6 +6301,194 @@
}
},
"schemas": {
"BinRangedouble": {
"description": "A type storing a range over `T`.\n\nThis type supports ranges similar to the `RangeTo`, `Range` and `RangeFrom` types in the standard library. Those cover `(..end)`, `(start..end)`, and `(start..)` respectively.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specifically partof this PR, but the description here could also use some work. We probably shouldn't reference the stdlib in API docs given it might not be clear which stdlib we're referring to. I don't consider this blocking for the PR.

@smklein smklein added api Related to the API. nexus Related to nexus Remote Access Preview labels Jul 12, 2022
@smklein smklein self-assigned this Jul 13, 2022
@smklein
Copy link
Collaborator

smklein commented Jul 13, 2022

As a PSA, I'm working on adding tests to this PR

nexus/src/app/oximeter.rs Outdated Show resolved Hide resolved
nexus/src/app/oximeter.rs Outdated Show resolved Hide resolved
@smklein smklein marked this pull request as ready for review July 18, 2022 17:22
@smklein
Copy link
Collaborator

smklein commented Jul 18, 2022

For a first revision, I think this is ready for review.

I expanded the simulated sled agent to emit some metrics, which are being tested in the integration tests for metrics.
Additionally, since I'm plumbing the LIMIT argument down through oximeter to support pagination, I added tests there too.

@smklein
Copy link
Collaborator

smklein commented Jul 20, 2022

I filed #1469 with some follow-ups on the design. LMK if there's any additional info I can provide to help with the review.

nexus/src/app/oximeter.rs Outdated Show resolved Hide resolved
nexus/src/app/oximeter.rs Outdated Show resolved Hide resolved
oximeter/db/src/client.rs Outdated Show resolved Hide resolved
pub upstairs_uuid: Uuid,
}

// TODO: It would be a lot nicer if we could just depend on the Crucible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems like a bummer. The crucible repo depends on oximeter from the main branch of omicron. I don't think there's been an update to oximeter since that dependency was put in, so I'm not sure if we'll always have this problem or not.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work pulling all this together. It's great to see oximeter getting some actual use! I've a few comments and questions, mostly around specifying time ranges and tests, but overall looks great.

@bnaecker bnaecker self-requested a review July 21, 2022 16:25
oximeter/db/src/client.rs Outdated Show resolved Hide resolved
@smklein
Copy link
Collaborator

smklein commented Jul 26, 2022

Still iterating on this in #1469 , but going to enable auto-merge on this PR to make it available for consumption:

image

@smklein smklein enabled auto-merge (squash) July 26, 2022 17:50
@david-crespo
Copy link
Contributor

I think that's good. Me trying to use it in the console is probably more useful than reviewing the code in detail.

@smklein smklein merged commit e9554ad into main Jul 26, 2022
@smklein smklein deleted the metrics-endpoints branch July 26, 2022 20:07
leftwo pushed a commit that referenced this pull request Jun 26, 2024
Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts.  These scripts are extracted into
the global zone at /opt/oxide/crucible_dtrace/

Update Crucible to latest includes these updates:
Clean up dependency checking, fixing space leak (#1372)
Make a DTrace package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371)
Remove `WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366)
Move 'do work for one job' into a helper function (#1365)
Remove `DownstairsWork` from map when handling it (#1361)
Using `block_in_place` for IO operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369)
Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350)
Support arbitrary Volumes during replace compare (#1349)
Remove the SQLite backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341)
Move Work into ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333)
Move disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330)
Move cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328)
Move remaining local state into a `struct ConnectionState` (#1327)
Consolidate negotiation + IO operations into one loop (#1322)
Allow replacement of a target in a read_only_parent (#1281)
Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326)
Move negotiation into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317)
Reuse a reqwest client when creating repair client (#1324)
Add % to keep buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314)
Added more DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)
leftwo added a commit that referenced this pull request Jun 26, 2024
Update Crucible and Propolis to the latest

Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts. These scripts are extracted into the 
global zone at /opt/oxide/crucible_dtrace/

Crucible latest includes these updates:
Clean up dependency checking, fixing space leak (#1372) Make a DTrace
package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371) Remove
`WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366) Move 'do
work for one job' into a helper function (#1365) Remove `DownstairsWork`
from map when handling it (#1361) Using `block_in_place` for IO
operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer
configuration (#1369) Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support
arbitrary Volumes during replace compare (#1349) Remove the SQLite
backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into
ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333) Move
disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330) Move
cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328) Move remaining
local state into a `struct ConnectionState` (#1327) Consolidate
negotiation + IO operations into one loop (#1322) Allow replacement of a
target in a read_only_parent (#1281) Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326) Move negotiation
into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317) Reuse a
reqwest client when creating repair client (#1324) Add % to keep
buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314) Added more
DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)

Propolis just has this one update:
Allow boot order config in propolis-standalone
---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API. nexus Related to nexus Remote Access Preview
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants