Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async backing: potential memory leak #1425

Closed
sandreim opened this issue Sep 6, 2023 · 7 comments
Closed

async backing: potential memory leak #1425

sandreim opened this issue Sep 6, 2023 · 7 comments
Labels
I2-bug The node fails to follow expected behavior.

Comments

@sandreim
Copy link
Contributor

sandreim commented Sep 6, 2023

This has been observed during a load test on Versi @ 300 validators and 60 parachains with async backing enabled.

Configuration:

asyncBackingParams: {
    maxCandidateDepth: 3
    allowedAncestryLen: 2
  }

I suspect statement-distribution or prospective-parachains to be the culprit.

At the same time some nodes seem to stop running with last message: 2023-09-06 12:13:34.377 ERROR tokio-runtime-worker polkadot_overseer: subsystem exited with error subsystem="statement-distribution-subsystem" err=FromOrigin { origin: "statement-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }

Possibly killed due to OOM, but still investigating.

Screenshot 2023-09-06 at 16 13 14
@sandreim sandreim added I2-bug The node fails to follow expected behavior. T8-parachains_engineering labels Sep 6, 2023
@sandreim
Copy link
Contributor Author

sandreim commented Sep 6, 2023

image

Nodes are indeed being evicted due to excessive mem usage, so the subsystem exited with error is explained.

@sandreim
Copy link
Contributor Author

sandreim commented Sep 6, 2023

Screenshot 2023-09-06 at 17 00 04

statement-distribution as a CPU usage champion.

@sandreim
Copy link
Contributor Author

sandreim commented Sep 7, 2023

@rphmeier the new logs you pushed to #1410 show that we are tracking an ever increasing number of leaves.

https://grafana.teleport.parity.io/goto/Et7Z5lzIR?orgId=1

This explains both unbounded memory and CPU usage growth.

@rphmeier
Copy link
Contributor

rphmeier commented Sep 7, 2023

67d1bc0

@eskimor eskimor moved this from To do to In progress in Parachains-core Oct 8, 2023
@the-right-joyce the-right-joyce moved this to In Progress in parachains team board Oct 12, 2023
@alexggh
Copy link
Contributor

alexggh commented Oct 20, 2023

67d1bc0

@rphmeier This fix doesn't seem to have made it into master, I can still see high cpu when testing on versi and it seems that the same thing started happening on rococo once we updated to a runtime supporting async-backing.

Any idea if it wasn't included intentionally or by accident ?

Screenshot 2023-10-20 at 13 44 42

@sandreim
Copy link
Contributor Author

This seems to be an oversight, fix is part of #1436 which was not merged 🤦🏼

@sandreim
Copy link
Contributor Author

sandreim commented Feb 2, 2024

stale fixed issue

@sandreim sandreim closed this as completed Feb 2, 2024
@github-project-automation github-project-automation bot moved this from Review in progress to Completed in parachains team board Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior.
Projects
Status: Completed
Status: In progress
Development

Successfully merging a pull request may close this issue.

4 participants