-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add counter metrics for remote execution #11155
Conversation
context | ||
.workunit_store | ||
.increment_counter(Metric::RemoteExecutionErrors, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RpcFailure
won't increment this one... is that a lower level error, or should this increment move up between Err(err) =>
and match err {..}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter will be incremented in the code that follows that match
. Note that the RpcFailure
match arm just deconstructs the error and turns it into a OperationOrStatus::Status
instance.
@@ -35,7 +35,12 @@ pub enum Metric { | |||
LocalCacheReadErrors, | |||
LocalCacheWriteErrors, | |||
LocalExecutionRequests, | |||
RemoteExecutionErrors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is incremented for every RPC error, rather than necessarily for every corresponding RemoteExecutionRequest
, which means it's sortof at a different level of abstraction. It might be good to namespace them differently and have these both be paired: "RPC attempt, success/fail" vs "execution attempt, success/fail"?
One thing that helps to encourage this in a lot of metrics APIs is to have a context-manager/wrapper that records both the attempt, and the success/failure as a pair. That's a little bit more challenging to use here since some of the "errors" are actually recoverable, and thus not counted that way, but just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I already increment a counter per RPC (RemoteExecutionMethodExecute
and RemoteExecutionMethodWaitExecution
depending on which method it is). I can rename those s/Method/RPC/ and have the error be "RPCError."
Then I can move the overall execution request counters (with its own error counter) up into remote::CommandRunner
.
78a61eb
to
f2b2e11
Compare
[ci skip-build-wheels]
[ci skip-build-wheels]
[ci skip-build-wheels]
[ci skip-build-wheels]
[ci skip-build-wheels]
f2b2e11
to
dd46836
Compare
Internal only changes: * Rewrite changelog helper to Python and use markdown for changelog ([#11224](#11224)) * Update to nails `0.8.0`. ([#11226](#11226)) * upgrade to Rust v1.48.0 ([#11210](#11210)) * Move Session to its own file. ([#11222](#11222)) * Port InteractiveProcess to Tokio to unblock async interruption ([#11219](#11219)) * Add `./cargo` script to facilitate Rust development ([#11218](#11218)) * Add partition metadata ([#11212](#11212)) * Hotfix unbound `$PY` variable breaking cargo script ([#11209](#11209)) * Remove dataclasses backport ([#11208](#11208)) * Stop using Python 3.6 internally and in CI ([#11175](#11175)) * Update to tokio 0.2.23. ([#11200](#11200)) * Add test address metadata ([#11193](#11193)) * Port more tests from `TestBase` to `RuleRunner` ([#11183](#11183)) * acquire GIL during work unit conversion ([#11186](#11186)) * Add counter metrics for remote execution ([#11155](#11155)) * add metrics to local cache code paths ([#11146](#11146)) * Deprecate the `sources` field for `python_awslambda` ([#11176](#11176)) * Make S3 bucket expiry deletion handling robust. ([#11156](#11156)) * Detect delete markers in our s3 cache of native_engine.so. ([#11140](#11140)) * Fix target globs for deleted file `pants_run_integration_test.py`. ([#11127](#11127)) * Deduplicate `Platform` and `PlatformConstraint` ([#11157](#11157)) * move counter increment into check_action_cache ([#11154](#11154)) * Replace the "base target" concept with "BUILD targets" ([#11136](#11136)) * Re-land the port of Pants' nailgun client to Rust ([#11147](#11147)) * Flatten execution and target PlatformConstraints in MultiPlatformProcess ([#11145](#11145)) * Prepare 2.1.0rc2 ([#11180](#11180)) * Prepare 2.1.0rc1 ([#11144](#11144)) * Prepare 2.0.1rc0 ([#11141](#11141)) * Prepare 2.0.1rc2 ([#11220](#11220)) * Prepare 2.1.1rc0 ([#11221](#11221)) * Prepare 2.1.0 ([#11198](#11198)) * Prepare 2.0.1rc1 ([#11192](#11192)) * Prepare 2.1.0rc3 ([#11191](#11191)) [ci skip-rust]
Add counters to the remote execution code to provide greater visibility into its operation.
[ci skip-build-wheels]