-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reth killed by OOM killer w/ 64GB RAM #5408
Comments
Can you please show the "Transaction Pool" section of dashboard for the same time range? |
Also, I took a quick try at running the jemalloc heap profiler in reth, but I didn't manage to get any output. I just changed the default feature in bin/reth/Cargo.toml to be: default = ["jemalloc-prof"] And then tried running various commands like: MALLOC_CONF="prof:true,lg_prof_interval:25" reth node Has anybody gotten heap profiles using this? I'm sure I'm missing something simple, but would appreciate a recipe for getting this working if anybody has one. |
I should be able to finish this #3819 in the next few days, but in the meantime you'll need to use |
Thanks @Rjected , that was super helpful. I've now been running reth with heap profiling & symbols for about a day using _RJEM_MALLOC_CONF=prof:true,prof_gdump:true,lg_prof_sample:19 . I'm hoping the high-water-mark profiling will be helpful when I hit the OOM condition, but we'll see. So far it has generated 160GB of heap profiles, so hopefully that doesn't grow too fast to be useful. I did look at the output for one of the profiles after an hour or two of running when reth seemed to be in a pretty steady-state memory-wise. I'm attaching the svg just in case anybody wants to have a look. I'll admit I can't make much sense of it with all the runtime stuff going on, but it almost looks like it is saying that revm_interpreter::interpreter::stack::Stack::new has 33GB allocated at the time. Seems crazy, so I'll assume I'm wrong about that, but I'd love to hear if anybody can make some sense of it: |
Hmm, are you tracing a lot with |
I agree that does look like the allocation site that the profile is pointing to. I make a lot of debug_traceCall RPC calls using the CallTracer. I don't know the reth internals, but that would certainly involve allocating an EVM stack. Given the STACK_CAPACITY, it looks like each stack should be 32KB, and the profile indicates 33GB, so that would indicate reth keeping 1 million stacks alive? Also, because I'm not seeing linear memory growth, it's not just a plain memory leak. The only thing I can come up with which would be consistent with this would be if there was a caching mechanism which was keeping all these EVM stacks alive. Or that the profile data I got is just bogus. |
do you have any example |
I don't have any RPC logs handy, but I'm making the calls from ethers-rs using the following code, using pending transactions from the mempool. So to reproduce you could just subscribe to pending transactions and use them in something like this:
I suspect it doesn't matter what the transaction is, but if there is an issue and it does turn out to be caching related you may need a bunch of unique ones. |
Looks like the jemalloc profile was right on the money. Having looked at the code, TracingInspector::start_step() does allocate and save a full stack, even if stacks are turned off, as the unwrap_or_default will call Stack::new():
If we're allocating a full 32KB stack per EVM instruction executed, then I think tracing a single txn could reasonably end up with ~1 million stacks allocated at once, and hit the 32GB of memory that I was seeing in the jemalloc profile. This lines up with my observation of getting an occasional huge memory spike and OOM too, as I imagine those were just from transactions with a very large number of entries in the trace. I'm testing this right now by changing Stack::new() in revm to allocate a zero-capacity vec. So far memory usage reported by jemalloc looks greatly improved: resident & mapped are more like 800MB than the previous steady state around 8GB. Still a bit early to say, but I'll run for a few days to confirm. I'll see if I can come up with a nice way to fix this. It'd be easy to fix just the case where stacks are not being used in TracingInspector by using an Option, but I'm hoping there's a way that will fix the case where the stack is actually needed as well. |
thanks for digging! fixed in #5521 by not using default, hehe |
No problem, thanks @mattsse ! You guys made the digging pretty easy with all the jemalloc tooling. I just read the PR and it looks like it will work to me, but I'll leave the review to somebody who knows the codebase better. The case where stacks are enabled will still be really bad, but that can be a separate issue. I can open an issue for that if it would be helpful. Here's the before and after with the relevant jemalloc stats. Pretty dramatic! ![]() |
yeah that would be helpful, pretty sure we can make this a bit smarter if we can double check what we actually need from the stack and maybe add some additional configs |
Cool, Just opened #5522 for this. |
@mattsse I'll close this unless there's some reason to keep it open. I've verified that the changes above have greatly decreased memory consumption when tracing. |
Describe the bug
reth syncs and tracks the chain head with no issues for days, but every week or two it will be killed by the linux OOM killer. This is during normal operation, not sync, so I've opened a new issue for it as the other OOM issues look to be during sync. The machine is running ubuntu 22.04 and has 64GB RAM. There are several other processes using a decent amount of memory, but there should be plenty left over for reth. You can see the detailed breakdown of process memory usage in the dmesg.txt attached.
The machine is running reth, lighthouse, an arbitrum node which is using reth, and an RPC client program which is primarily calling eth_call and eth_subscribe on reth. It's not a particularly heavy RPC load, but there may be some bursts or activity. I don't see signs of anything crazy happening right before the OOM in any of the client logs.
I've had this happen twice now. You can see the kernel logs for both OOM events in the attached logs. From the most recent one, reth is using 39612664kB of anon-rss at the time it is killed. I've also pasted a screenshot from grafana of the memory stats before the latest OOM. Eyeballing the rest of the grafana stats I don't see anything concerning in that time period. There are 10 inflight requests and 14 readonly transactions open right before the crash, but this is not unusual for the days prior where no issues were observed. I'd be happy to send more data from grafana if you have a nice way for me to export it. Reth log before the crash is attached as well, but nothing in it caught my eye.
From the grafana jemalloc stats, there are two quick spikes in memory usage right about 20 minutes before the crash which look like they might be similar in nature to the crash, but didn't quite trigger the OOM killer. At the crash, the RSS goes from 20GB to 33GB in one tick of the graph, and appears to hit about 40GB at the crash. At the same time, the jemalloc stats show "active", "allocated", and "mapped" all jumping from <30GB to >400GB.
reth.log
![reth-oom-dashboard](https://private-user-images.githubusercontent.com/127057440/282310211-3ae3860f-5922-4842-b4cc-22495b696aa0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3OTA3NDEsIm5iZiI6MTczOTc5MDQ0MSwicGF0aCI6Ii8xMjcwNTc0NDAvMjgyMzEwMjExLTNhZTM4NjBmLTU5MjItNDg0Mi1iNGNjLTIyNDk1YjY5NmFhMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QxMTA3MjFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mZDEzNjBiY2JiZGRkY2MxMjU2ZWJiNTI2MjYwNmU3MTRjNGJlM2ZlMjQzOTNhMmMxODFkNzRlMDFhYmUyNTBjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.EqN-rL4V-8QeolQ3n2M0r55nxtP_NCmrJ4vo5gf1hzc)
dmesg.txt
Let me know if there's anything else I can do to help track this down. Do you have any guesses as to what would be using ~40GB of anon-rss?
Steps to reproduce
Node logs
Platform(s)
Linux (x86)
What version/commit are you on?
reth Version: 0.1.0-alpha.10 Commit SHA: a9fa281 Build Timestamp: 2023-10-28T03:44:10.194397854Z Build Features: default,jemalloc Build Profile: release
What database version are you on?
1
What type of node are you running?
Archive (default)
What prune config do you use, if any?
None
If you've built Reth from source, provide the full command you used
cargo install --locked --path bin/reth --bin reth
Code of Conduct
The text was updated successfully, but these errors were encountered: