-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor most tasks/futures that might be consuming memory. #5761
Conversation
6bae88c
to
0c36722
Compare
async move { Self::handle_certificate(state, certificate, fault_config).await }, | ||
) | ||
spawn_monitored_task!(async move { | ||
Self::handle_certificate(state, certificate, fault_config).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: here (and in many similar places) we don't really need async move
block.
For example this line is equivalent:
spawn_monitored_task!(Self::handle_certificate(state, certificate, fault_config))
I know this seem to be a pattern across all the code in sui, but since you are touching those places perhaps you can fix it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed a bunch of these cases.
crates/sui-metrics/src/lib.rs
Outdated
.replace("-", "_") | ||
.replace(".", "_"); | ||
|
||
let name = format!("monitored_future_{}_{}", file, ::std::line!()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more reasonable to use prometheus labels, rather then compose a name.
E.g. have a monitored_future
vec_gauge, and have labels for file and line.
This would simplify filtering in the prometheus (and is equivalent as far as performance concerned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks. I don't understand prometheus very well but this sounds like the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, this is much much better now.
crates/sui-metrics/src/lib.rs
Outdated
.expect("sui_metrics::init_metrics duplicate init") | ||
} | ||
|
||
pub fn get_metric(name: String, description: String) -> Option<IntGauge> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this entire function could go away if you use metrics labels
0c36722
to
99dd678
Compare
@@ -361,7 +360,7 @@ where | |||
let node_sync_store = self.state.node_sync_store.clone(); | |||
|
|||
info!("spawning node sync task"); | |||
let join_handle = tokio::task::spawn(node_sync_process( | |||
let join_handle = spawn_monitored_task!(node_sync_process( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this introduce significant overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. It adds a string format, and a gauge increment/decrement around the wrapped future.
@@ -205,7 +206,7 @@ async fn follower_process<A, Handler: DigestHandler<A> + Clone>( | |||
peer_names.insert(name); | |||
let local_active_ref_copy = local_active.clone(); | |||
let handler_clone = handler.clone(); | |||
gossip_tasks.push(async move { | |||
gossip_tasks.push(monitored_future!(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is monitored here exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its keeping track of the number of live futures spawned at each callsite. See sui-metrics/src/lib.rs
99dd678
to
613cbec
Compare
crates/sui-metrics/src/lib.rs
Outdated
pub fn init_metrics(registry: &Registry) { | ||
METRICS | ||
.set(Metrics::new(registry)) | ||
.expect("sui_metrics::init_metrics duplicate init") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this expect fail in our tests which spawn multiple nodes in the same process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixed.
crates/sui-metrics/src/lib.rs
Outdated
}}; | ||
|
||
($metric: ident, $fut: expr) => {{ | ||
let name = format!("{}_{}", file!(), line!()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the following and eliminate the overhead of the format each time we spawn something.
let name = format!("{}_{}", file!(), line!()); | |
const LOCATION: &str = concat!(file!(), ':', line!()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
295479a
to
12ae866
Compare
@andll I dismissed your review so I can merge, but if you have any additional comments please let me know and I'll fix them in a subsequent PR. |
No description provided.