Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor most tasks/futures that might be consuming memory. #5761

Merged
merged 7 commits into from
Nov 2, 2022

Conversation

mystenmark
Copy link
Contributor

No description provided.

andll
andll previously requested changes Nov 1, 2022
async move { Self::handle_certificate(state, certificate, fault_config).await },
)
spawn_monitored_task!(async move {
Self::handle_certificate(state, certificate, fault_config).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: here (and in many similar places) we don't really need async move block.
For example this line is equivalent:

spawn_monitored_task!(Self::handle_certificate(state, certificate, fault_config))

I know this seem to be a pattern across all the code in sui, but since you are touching those places perhaps you can fix it as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed a bunch of these cases.

.replace("-", "_")
.replace(".", "_");

let name = format!("monitored_future_{}_{}", file, ::std::line!());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more reasonable to use prometheus labels, rather then compose a name.

E.g. have a monitored_future vec_gauge, and have labels for file and line.
This would simplify filtering in the prometheus (and is equivalent as far as performance concerned)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. I don't understand prometheus very well but this sounds like the right thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, this is much much better now.

.expect("sui_metrics::init_metrics duplicate init")
}

pub fn get_metric(name: String, description: String) -> Option<IntGauge> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entire function could go away if you use metrics labels

@@ -361,7 +360,7 @@ where
let node_sync_store = self.state.node_sync_store.clone();

info!("spawning node sync task");
let join_handle = tokio::task::spawn(node_sync_process(
let join_handle = spawn_monitored_task!(node_sync_process(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this introduce significant overhead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It adds a string format, and a gauge increment/decrement around the wrapped future.

@@ -205,7 +206,7 @@ async fn follower_process<A, Handler: DigestHandler<A> + Clone>(
peer_names.insert(name);
let local_active_ref_copy = local_active.clone();
let handler_clone = handler.clone();
gossip_tasks.push(async move {
gossip_tasks.push(monitored_future!(async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is monitored here exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its keeping track of the number of live futures spawned at each callsite. See sui-metrics/src/lib.rs

Comment on lines 38 to 46
pub fn init_metrics(registry: &Registry) {
METRICS
.set(Metrics::new(registry))
.expect("sui_metrics::init_metrics duplicate init")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this expect fail in our tests which spawn multiple nodes in the same process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed.

}};

($metric: ident, $fut: expr) => {{
let name = format!("{}_{}", file!(), line!());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the following and eliminate the overhead of the format each time we spawn something.

Suggested change
let name = format!("{}_{}", file!(), line!());
const LOCATION: &str = concat!(file!(), ':', line!());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mystenmark mystenmark dismissed andll’s stale review November 2, 2022 11:27

addressed all comments

@mystenmark
Copy link
Contributor Author

@andll I dismissed your review so I can merge, but if you have any additional comments please let me know and I'll fix them in a subsequent PR.

@mystenmark mystenmark enabled auto-merge (squash) November 2, 2022 11:27
@mystenmark mystenmark merged commit 82a6440 into main Nov 2, 2022
@mystenmark mystenmark deleted the mlogan-monitor-futures branch November 2, 2022 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants