Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Memory consumption with large Transaction Pool #11503

Open
2 tasks done
shunsukew opened this issue May 23, 2022 · 9 comments
Open
2 tasks done

Memory consumption with large Transaction Pool #11503

shunsukew opened this issue May 23, 2022 · 9 comments
Assignees
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@shunsukew
Copy link
Contributor

shunsukew commented May 23, 2022

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Substrate node with large transaction pool limit configuration (for e.g. --pool-limit 65536. larger than default pool limit.) consumes whole Mem (32GB) of the machine when pooled transactions count hits around 20k. Memory usage grows rapidly and reaches 100% of 32GB memory.

Are there any potential issues around Transaction Pool? such as memory leak.

Case 1. Transaction Pool 20k (2022-05-22 21:50:00 ~ 2022-05-22 23:00:00 UTC +8)
Transaction pool
スクリーンショット 2022-05-22 23 07 17
Mem
スクリーンショット 2022-05-22 23 06 58
CPU
スクリーンショット 2022-05-22 23 06 47
Once memory usage hits 100%, machine will become not reachable.

Case 2. Default Transaction Pool Limit (2022-05-22 23:20:00 ~ 2022-05-22 23:40:00 UTC +8)
Transaction Pool
スクリーンショット 2022-05-22 23 40 10
Mem
スクリーンショット 2022-05-22 23 40 01
CPU
スクリーンショット 2022-05-22 23 39 51

(Machine Spec)
CPU optimized machine (Fast CPU)
16 vCPU
32GB Mem
General Purpose SSD - 16KiB IOPS & throughput 250 MiB/s

Steps to reproduce

Set default pool limit --pool-limit more than 20k and have +19k transactions in the pool. (I did this by running Astar node and sync blocks with peers as of 2022/05/23.)

@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label May 23, 2022
@bkchr
Copy link
Member

bkchr commented May 23, 2022

Did you also changed --pool-kbytes?

@shunsukew
Copy link
Contributor Author

shunsukew commented May 23, 2022

@bkchr Thank you for the comment.
No, I don't. That means default value is used?

--pool-kbytes <COUNT>
            Maximum number of kilobytes of all transactions stored in the pool [default: 20480]

@bkchr
Copy link
Member

bkchr commented May 30, 2022

@koute could you may look into this?

@koute
Copy link
Contributor

koute commented May 31, 2022

@koute could you may look into this?

Sure; I'm on it.

@koute koute self-assigned this May 31, 2022
@koute
Copy link
Contributor

koute commented May 31, 2022

The issue doesn't seem to reproduce on a normal Kusama node (or maybe it just needs to be sync'd from scratch; I haven't checked yet), however I think I've managed to reproduce it on the newest astar-collator (I haven't let it run until memory exhaustion, but it looks like the memory's growing). I'm profiling it to see why it is growing.

@koute
Copy link
Contributor

koute commented May 31, 2022

@shunsukew For reference, can you provide the exact command line you've used to launch your node?

@koute
Copy link
Contributor

koute commented May 31, 2022

So I think I see the memory usage increase, but it's nowhere near as fast as on the screenshots posted by @shunsukew. I'll leave it running overnight (and if it doesn't reproduce I'll try maybe spamming it with fake transactions), however it'd be nice if there was a way I could reproduce it to behave as in the original issue as that would make it a lot easier to investigate.

In the meantime I've also noticed that the Astar node uses the system allocator and doesn't use jemalloc like Polkadot does; this is not good, and it might contribute to the problem. (I could check if I knew how to exactly reproduce it.) I've put up a PR here enabling jemalloc for your node: AstarNetwork/Astar#653

@bLd75
Copy link

bLd75 commented Jun 1, 2022

Hi @koute thank you very much for the PR!

Below are tests made on a collator node with this simple command (before and after change made ~19:15):
/usr/local/bin/astar-collator --collator --rpc-cors all --name collator --base-path /var/lib/astar --state-cache-size 0 --prometheus-external --pool-limit 65536 --port 30333 --chain astar --parachain-id 2006 --telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
I think node has to be fully sync to reproduce.
Previous data reported was from a public node (archive mode).

Metrics on the same time frame

Transaction queue
image

RAM (32Gb total) increases fast but doesn't get totally full from the beginning:
image

CPU consumption doesn't change much but gets higher
image

Peers number gets unstable
image

Network traffic increases in huge proportions, the node is sending incredible amount of data
image

I will test your PR just after as a next step.

@shunsukew
Copy link
Contributor Author

shunsukew commented Jun 1, 2022

@koute @bLd75
Thank you for the PR and additional information

@shunsukew shunsukew changed the title Memory consumption by Transaction Pool Memory consumption with large Transaction Pool Jun 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

No branches or pull requests

4 participants