Reth killed by OOM killer w/ 64GB RAM #5408

gbrew · 2023-11-12T16:27:41Z

Describe the bug

reth syncs and tracks the chain head with no issues for days, but every week or two it will be killed by the linux OOM killer. This is during normal operation, not sync, so I've opened a new issue for it as the other OOM issues look to be during sync. The machine is running ubuntu 22.04 and has 64GB RAM. There are several other processes using a decent amount of memory, but there should be plenty left over for reth. You can see the detailed breakdown of process memory usage in the dmesg.txt attached.

The machine is running reth, lighthouse, an arbitrum node which is using reth, and an RPC client program which is primarily calling eth_call and eth_subscribe on reth. It's not a particularly heavy RPC load, but there may be some bursts or activity. I don't see signs of anything crazy happening right before the OOM in any of the client logs.

I've had this happen twice now. You can see the kernel logs for both OOM events in the attached logs. From the most recent one, reth is using 39612664kB of anon-rss at the time it is killed. I've also pasted a screenshot from grafana of the memory stats before the latest OOM. Eyeballing the rest of the grafana stats I don't see anything concerning in that time period. There are 10 inflight requests and 14 readonly transactions open right before the crash, but this is not unusual for the days prior where no issues were observed. I'd be happy to send more data from grafana if you have a nice way for me to export it. Reth log before the crash is attached as well, but nothing in it caught my eye.

From the grafana jemalloc stats, there are two quick spikes in memory usage right about 20 minutes before the crash which look like they might be similar in nature to the crash, but didn't quite trigger the OOM killer. At the crash, the RSS goes from 20GB to 33GB in one tick of the graph, and appears to hit about 40GB at the crash. At the same time, the jemalloc stats show "active", "allocated", and "mapped" all jumping from <30GB to >400GB.

reth.log
dmesg.txt

Let me know if there's anything else I can do to help track this down. Do you have any guesses as to what would be using ~40GB of anon-rss?

Steps to reproduce

Sync reth
Run some load against it
Wait
Boom!

Node logs

attached above

Platform(s)

Linux (x86)

What version/commit are you on?

reth Version: 0.1.0-alpha.10 Commit SHA: a9fa281 Build Timestamp: 2023-10-28T03:44:10.194397854Z Build Features: default,jemalloc Build Profile: release

What database version are you on?

1

What type of node are you running?

Archive (default)

What prune config do you use, if any?

None

If you've built Reth from source, provide the full command you used

cargo install --locked --path bin/reth --bin reth

Code of Conduct

I agree to follow the Code of Conduct

The text was updated successfully, but these errors were encountered:

shekhirin · 2023-11-14T12:31:32Z

Can you please show the "Transaction Pool" section of dashboard for the same time range?

gbrew · 2023-11-14T15:55:55Z

Can you please show the "Transaction Pool" section of dashboard for the same time range?

Of course, here it is:

gbrew · 2023-11-14T16:02:07Z

Also, I took a quick try at running the jemalloc heap profiler in reth, but I didn't manage to get any output. I just changed the default feature in bin/reth/Cargo.toml to be:

default = ["jemalloc-prof"]

And then tried running various commands like:

MALLOC_CONF="prof:true,lg_prof_interval:25" reth node

Has anybody gotten heap profiles using this? I'm sure I'm missing something simple, but would appreciate a recipe for getting this working if anybody has one.

Rjected · 2023-11-14T23:11:14Z

Has anybody gotten heap profiles using this? I'm sure I'm missing something simple, but would appreciate a recipe for getting this working if anybody has one.

I should be able to finish this #3819 in the next few days, but in the meantime you'll need to use _RJEM_MALLOC_CONF="prof:true,lg_prof_interval:25" instead

gbrew · 2023-11-20T20:08:00Z

Has anybody gotten heap profiles using this? I'm sure I'm missing something simple, but would appreciate a recipe for getting this working if anybody has one.

I should be able to finish this #3819 in the next few days, but in the meantime you'll need to use _RJEM_MALLOC_CONF="prof:true,lg_prof_interval:25" instead

Thanks @Rjected , that was super helpful. I've now been running reth with heap profiling & symbols for about a day using _RJEM_MALLOC_CONF=prof:true,prof_gdump:true,lg_prof_sample:19 . I'm hoping the high-water-mark profiling will be helpful when I hit the OOM condition, but we'll see. So far it has generated 160GB of heap profiles, so hopefully that doesn't grow too fast to be useful.

I did look at the output for one of the profiles after an hour or two of running when reth seemed to be in a pretty steady-state memory-wise. I'm attaching the svg just in case anybody wants to have a look. I'll admit I can't make much sense of it with all the runtime stuff going on, but it almost looks like it is saying that revm_interpreter::interpreter::stack::Stack::new has 33GB allocated at the time. Seems crazy, so I'll assume I'm wrong about that, but I'd love to hear if anybody can make some sense of it:

Rjected · 2023-11-20T23:03:11Z

Has anybody gotten heap profiles using this? I'm sure I'm missing something simple, but would appreciate a recipe for getting this working if anybody has one.

I should be able to finish this #3819 in the next few days, but in the meantime you'll need to use _RJEM_MALLOC_CONF="prof:true,lg_prof_interval:25" instead

Thanks @Rjected , that was super helpful. I've now been running reth with heap profiling & symbols for about a day using _RJEM_MALLOC_CONF=prof:true,prof_gdump:true,lg_prof_sample:19 . I'm hoping the high-water-mark profiling will be helpful when I hit the OOM condition, but we'll see. So far it has generated 160GB of heap profiles, so hopefully that doesn't grow too fast to be useful.

I did look at the output for one of the profiles after an hour or two of running when reth seemed to be in a pretty steady-state memory-wise. I'm attaching the svg just in case anybody wants to have a look. I'll admit I can't make much sense of it with all the runtime stuff going on, but it almost looks like it is saying that revm_interpreter::interpreter::stack::Stack::new has 33GB allocated at the time. Seems crazy, so I'll assume I'm wrong about that, but I'd love to hear if anybody can make some sense of it:

Hmm, are you tracing a lot with record_stack_snapshots? It could be that a stack is being initialized, with STACK_CAPACITY, many times, on each step:
https://github.com/bluealloy/revm/blob/5aff2f1146d3650f59fdd8e148143013b1ecd7c2/crates/interpreter/src/interpreter/stack.rs#L43-L44

gbrew · 2023-11-21T00:44:43Z

https://github.com/bluealloy/revm/blob/5aff2f1146d3650f59fdd8e148143013b1ecd7c2/crates/interpreter/src/interpreter/stack.rs#L43-L44

I agree that does look like the allocation site that the profile is pointing to. I make a lot of debug_traceCall RPC calls using the CallTracer. I don't know the reth internals, but that would certainly involve allocating an EVM stack. Given the STACK_CAPACITY, it looks like each stack should be 32KB, and the profile indicates 33GB, so that would indicate reth keeping 1 million stacks alive? Also, because I'm not seeing linear memory growth, it's not just a plain memory leak. The only thing I can come up with which would be consistent with this would be if there was a caching mechanism which was keeping all these EVM stacks alive. Or that the profile data I got is just bogus.

Rjected · 2023-11-21T01:08:50Z

https://github.com/bluealloy/revm/blob/5aff2f1146d3650f59fdd8e148143013b1ecd7c2/crates/interpreter/src/interpreter/stack.rs#L43-L44

I agree that does look like the allocation site that the profile is pointing to. I make a lot of debug_traceCall RPC calls using the CallTracer. I don't know the reth internals, but that would certainly involve allocating an EVM stack. Given the STACK_CAPACITY, it looks like each stack should be 32KB, and the profile indicates 33GB, so that would indicate reth keeping 1 million stacks alive? Also, because I'm not seeing linear memory growth, it's not just a plain memory leak. The only thing I can come up with which would be consistent with this would be if there was a caching mechanism which was keeping all these EVM stacks alive. Or that the profile data I got is just bogus.

do you have any example debug_traceCall RPC that we can use to try to reproduce?

gbrew · 2023-11-21T03:49:20Z

https://github.com/bluealloy/revm/blob/5aff2f1146d3650f59fdd8e148143013b1ecd7c2/crates/interpreter/src/interpreter/stack.rs#L43-L44

I agree that does look like the allocation site that the profile is pointing to. I make a lot of debug_traceCall RPC calls using the CallTracer. I don't know the reth internals, but that would certainly involve allocating an EVM stack. Given the STACK_CAPACITY, it looks like each stack should be 32KB, and the profile indicates 33GB, so that would indicate reth keeping 1 million stacks alive? Also, because I'm not seeing linear memory growth, it's not just a plain memory leak. The only thing I can come up with which would be consistent with this would be if there was a caching mechanism which was keeping all these EVM stacks alive. Or that the profile data I got is just bogus.

do you have any example debug_traceCall RPC that we can use to try to reproduce?

I don't have any RPC logs handy, but I'm making the calls from ethers-rs using the following code, using pending transactions from the mempool. So to reproduce you could just subscribe to pending transactions and use them in something like this:

    let tracing_options = GethDebugTracingOptions {
        enable_return_data: Some(false),
        enable_memory: Some(false),
        disable_stack: Some(true),
        disable_storage: Some(true),
        tracer: Some(GethDebugTracerType::BuiltInTracer(GethDebugBuiltInTracerType::CallTracer)),
        tracer_config: Some(GethDebugTracerConfig::BuiltInTracer(GethDebugBuiltInTracerConfig::CallTracer(CallConfig{only_top_call: Some(false), with_log: Some(true)}))),
        ..Default::default()
    };
    let tracing_options = GethDebugTracingCallOptions { tracing_options };

    let mut tx_req: TransactionRequest = tx.into();
    tx_req.nonce = None;
    tx_req.gas_price = None;

    let debug_trace = client.debug_trace_call(tx_req, block, tracing_options).await?;

I suspect it doesn't matter what the transaction is, but if there is an issue and it does turn out to be caching related you may need a bunch of unique ones.

gbrew · 2023-11-21T18:16:42Z

Looks like the jemalloc profile was right on the money. Having looked at the code, TracingInspector::start_step() does allocate and save a full stack, even if stacks are turned off, as the unwrap_or_default will call Stack::new():

reth/crates/revm/revm-inspectors/src/tracing/mod.rs

Line 286 in 6904194

    
           self.config.record_stack_snapshots.then(|| interp.stack.clone()).unwrap_or_default();

If we're allocating a full 32KB stack per EVM instruction executed, then I think tracing a single txn could reasonably end up with ~1 million stacks allocated at once, and hit the 32GB of memory that I was seeing in the jemalloc profile. This lines up with my observation of getting an occasional huge memory spike and OOM too, as I imagine those were just from transactions with a very large number of entries in the trace.

I'm testing this right now by changing Stack::new() in revm to allocate a zero-capacity vec. So far memory usage reported by jemalloc looks greatly improved: resident & mapped are more like 800MB than the previous steady state around 8GB. Still a bit early to say, but I'll run for a few days to confirm.

I'll see if I can come up with a nice way to fix this. It'd be easy to fix just the case where stacks are not being used in TracingInspector by using an Option, but I'm hoping there's a way that will fix the case where the stack is actually needed as well.

mattsse · 2023-11-21T19:32:18Z

thanks for digging!

fixed in #5521 by not using default, hehe

gbrew · 2023-11-21T20:03:15Z

thanks for digging!

fixed in #5521 by not using default, hehe

No problem, thanks @mattsse ! You guys made the digging pretty easy with all the jemalloc tooling. I just read the PR and it looks like it will work to me, but I'll leave the review to somebody who knows the codebase better. The case where stacks are enabled will still be really bad, but that can be a separate issue. I can open an issue for that if it would be helpful.

Here's the before and after with the relevant jemalloc stats. Pretty dramatic!

mattsse · 2023-11-21T20:06:54Z

I can open an issue for that if it would be helpful.

yeah that would be helpful, pretty sure we can make this a bit smarter if we can double check what we actually need from the stack and maybe add some additional configs

gbrew · 2023-11-21T21:26:47Z

I can open an issue for that if it would be helpful.

yeah that would be helpful, pretty sure we can make this a bit smarter if we can double check what we actually need from the stack and maybe add some additional configs

Cool, Just opened #5522 for this.

gbrew · 2023-11-21T21:51:34Z

Here's the jemalloc profile after my basic fix, just in case anybody's curious:

gbrew · 2023-11-25T17:00:39Z

@mattsse I'll close this unless there's some reason to keep it open. I've verified that the changes above have greatly decreased memory consumption when tracing.

gbrew added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Nov 12, 2023

github-project-automation bot added this to Reth Tracker Nov 12, 2023

github-project-automation bot moved this to Todo in Reth Tracker Nov 12, 2023

shekhirin added S-needs-investigation This issue requires detective work to figure out what's going wrong and removed S-needs-triage This issue needs to be labelled labels Nov 14, 2023

mattsse mentioned this issue Nov 21, 2023

fix: dont use Stack::default #5521

Merged

gbrew mentioned this issue Nov 21, 2023

Excessive memory usage by TracingInspector when recording stacks #5522

Closed

1 task

gbrew closed this as completed Nov 27, 2023

github-project-automation bot moved this from Todo to Done in Reth Tracker Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reth killed by OOM killer w/ 64GB RAM #5408

Reth killed by OOM killer w/ 64GB RAM #5408

gbrew commented Nov 12, 2023

shekhirin commented Nov 14, 2023 •

edited

Loading

gbrew commented Nov 14, 2023

gbrew commented Nov 14, 2023

Rjected commented Nov 14, 2023

gbrew commented Nov 20, 2023

Rjected commented Nov 20, 2023

gbrew commented Nov 21, 2023

Rjected commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 21, 2023

mattsse commented Nov 21, 2023

gbrew commented Nov 21, 2023

mattsse commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 25, 2023

Reth killed by OOM killer w/ 64GB RAM #5408

Reth killed by OOM killer w/ 64GB RAM #5408

Comments

gbrew commented Nov 12, 2023

Describe the bug

Steps to reproduce

Node logs

Platform(s)

What version/commit are you on?

What database version are you on?

What type of node are you running?

What prune config do you use, if any?

If you've built Reth from source, provide the full command you used

Code of Conduct

shekhirin commented Nov 14, 2023 • edited Loading

gbrew commented Nov 14, 2023

gbrew commented Nov 14, 2023

Rjected commented Nov 14, 2023

gbrew commented Nov 20, 2023

Rjected commented Nov 20, 2023

gbrew commented Nov 21, 2023

Rjected commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 21, 2023

mattsse commented Nov 21, 2023

gbrew commented Nov 21, 2023

mattsse commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 21, 2023

gbrew commented Nov 25, 2023

shekhirin commented Nov 14, 2023 •

edited

Loading