Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPCDaemon Out of Memory Error #1458

Closed
relyt29 opened this issue Jan 27, 2021 · 11 comments
Closed

RPCDaemon Out of Memory Error #1458

relyt29 opened this issue Jan 27, 2021 · 11 comments

Comments

@relyt29
Copy link

relyt29 commented Jan 27, 2021

System information

Turbogeth version: tg version 2021.01.2-alpha commit
OS & Version: Debian GNU/Linux 10 (buster)
Commit hash : 58c4e03

Expected behaviour

Rpcdaemon probably should not try to use 2234 GB of virtual memory lol

Actual behaviour

RPCDaemon is killed by the OOM manager on Linux

Steps to reproduce the behaviour

./tg running in anothe terminal, tmux pane etc

./rpcdaemon --chaindata /path/to/turbogeth/datadir/tg/chaindata --http.api=eth,debug,net --http.addr="127.0.0.1" --http.port=8646 --http.vhosts="*" --http.corsdomain="*"

open another terminal, tmux pane etc

curl -X POST  -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"debug_traceTransaction","params":["0xd80a10106f2a8fb76cf6db6b9f7a64f08257b4952c94c888e1ac41ceb80733c9"],"id":1}' "http://127.0.0.1:8646"

Backtrace

pretty easy to reproduce, let me know if you really need a backtrace from me

OOM Killer output in Dmesg

[Wed Jan 27 19:26:46 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB                          
[Wed Jan 27 19:26:46 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB                             
[Wed Jan 27 19:26:46 2021] 3281 total pagecache pages                                                                                   
[Wed Jan 27 19:26:46 2021] 586 pages in swap cache                                                                                      
[Wed Jan 27 19:26:46 2021] Swap cache stats: add 32727081, delete 32744125, find 783569039/785482007                                    
[Wed Jan 27 19:26:46 2021] Free swap  = 4kB                                                                                             
[Wed Jan 27 19:26:46 2021] Total swap = 33520636kB                                                                                      
[Wed Jan 27 19:26:46 2021] 16753828 pages RAM                                                                                           
[Wed Jan 27 19:26:46 2021] 0 pages HighMem/MovableOnly                                                                                  
[Wed Jan 27 19:26:46 2021] 284482 pages reserved                                                                                        
[Wed Jan 27 19:26:46 2021] 0 pages hwpoisoned                                                                                           
[Wed Jan 27 19:26:46 2021] Tasks state (memory values in pages):                                                                        
[Wed Jan 27 19:26:46 2021] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name                           
[Wed Jan 27 19:26:46 2021] [    441]     0   441    21749        0   167936      319             0 systemd-journal                      
... redacted for privacy and brevity
[Wed Jan 27 19:26:46 2021] [  22318]  1000 22318 539467873   348959 2110885888   235305             0 tg
...   
[Wed Jan 27 19:56:39 2021] [  30572]  1000 30572 558542091 13423083 186310656  7681362             0 rpcdaemon

[Wed Jan 27 19:56:39 2021] Out of memory: Kill process 30572 (rpcdaemon) score 851 or sacrifice child                                   
[Wed Jan 27 19:56:39 2021] Killed process 30572 (rpcdaemon) total-vm:2234168364kB, anon-rss:53692332kB, file-rss:0kB, shmem-rss:0kB 

I think this is separate from #1119

I had @MysticRyuujin repro the bug on his machine and he can confirm it is a bug as well:

fwiw I have logging enabled at debug on the rpcdaemon and I didn't get any crash
It crashed
but it produced no logs
@MysticRyuujin
Copy link
Contributor

This debug also kills Geth with the same behavior. There are no logs, just straight killed 😄

@Haaroon
Copy link

Haaroon commented Jan 27, 2021

I concur with tyler that this tx indeed causes a crash

@AlexeyAkhunov
Copy link
Contributor

Got you, thank you for reporting, will try to figure out what is going on

@MysticRyuujin
Copy link
Contributor

It doesn't sound like the Geth team will fix this. Their answer is don't expose the debug. 🤷‍♂️

Nethermind won't fix the 2GB limit on their end, just say "disable memory dump" lol

@relyt29
Copy link
Author

relyt29 commented Feb 4, 2021

well so in ethereum/go-ethereum#22244 (comment) @holiman suggests that debug_standardTracetransaction is an option for complex VM traces (granted, if you have filesystem support) I am not familiar with that specific call, but an attempt to call it on turbo-geth fails:

(venv) user@Debian-101-buster-64-minimal:~/folder$ cat tester.sh
curl -X POST  -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"debug_standardTracetransaction","params":["0xa009fb5190b8dd3c14ecf33c9f081d02e62fd8d81399e679b86c4701aede0b42"],"id":1}' "http://127.0.0.1:8646"                         
(venv) user@Debian-101-buster-64-minimal:~/folder$ bash tester.sh
{"jsonrpc":"2.0","id":1,"error":{"code":-32601,"message":"the method debug_standardTracetransaction does not exist/is not available"}}  

I found a number more of these crashing transactions just in the process of the work I was doing, e.g.

"skip_list": ["0xf70ba0055e36842991735d92990d6a3cd3a312604848310b755d31d6ccc8a9b5", "0xd80a10106f2a8fb76cf6db6b9f7a64f08257b4952c94c888e1ac41ceb80733c9", "0x98db24ba4df916ef3b1f152bf32bd885277c80c1bf70923906a902e0d68ea6be",
"0xc8223b985ac63c69cfe1bdf70a61707e0e5ffa37534fc3582b6059fe5cc46253", "0xcc9602725fea678892c7a6b65dedac9e29b64c39679553b992fd3891f6334abf"]

Given that the average turbo-geth user is going to be wanting to use turbogeth probably for archive node purposes, maybe analysis of some kind, I think it probably makes sense to expose at least some API like the geth developers are doing that allows you to get at the VM traces for these complex transactions, especially since these more complex transactions are likely the most interesting ones to look at for analysis

@AlexeyAkhunov
Copy link
Contributor

AlexeyAkhunov commented Feb 10, 2021

As one may suspect, the issue is that this transaction has a very large trace, and internal representation of such traces is probably quite inefficient. This was captured while the processing was still on-going.
image

@AlexeyAkhunov
Copy link
Contributor

I suspect that the problem is that the Geth -style tracing repeats the memory and stack content at each step, and for some transactions like that this really blows out of proportion. Even if we could make a stream out of it and still squeeze out that enormous JSON response, would it be super useful (it will probably be hundred gigabytes in size or something)?
Maybe the way to go is to expose DisableMemory, DisableStack, DisableStorage and DisableReturnData to the API?

@relyt29
Copy link
Author

relyt29 commented Feb 10, 2021

Maybe the way to go is to expose DisableMemory, DisableStack, DisableStorage and DisableReturnData to the API?

I think this is useful in that it allows users to finer-tune the debugTraces and also might make pulling out many traces at once faster (as I was doing, iterating over every transaction in specific blocks for example). I also think it's a useful short term solution to the problem.

But it does concern me if that's all that's done, that it is possible to have transactions that we just can't find out what those TXs did post-facto without crashing the node. Maybe you just have to turn to another piece of software (e.g. parity, whatever) to get traces for those transactions, maybe that's fine... But if you want to know using TurboGeth what contracts a transaction triggered (e.g. contract A.functionA() calls B.functionB(), I don't know how you can get that without a successful non-crashing debug_TraceTransaction.

(If you have the memory and stack, the second item on the stack in the CALL output will be the address of the contract called (i.e. B) and the input parameters are stored in memory at pointers pointed to in the stack). So that's an example of where you might need/want to have debug_traceTransaction and you really can't disable memory or stack, which are both going to be the largest sources of the blowup in data size.

And of course the biggest most complicated transactions are likely to be the most interesting and juicy to look at for analysis...

@holiman
Copy link
Contributor

holiman commented Feb 11, 2021

But it does concern me if that's all that's done, that it is possible to have transactions that we just can't find out what those TXs did post-facto without crashing the node

That's the exact reason why geth have debug_standardtraceblocktofile. Has that been removed from turbo-geth ?

https://geth.ethereum.org/docs/rpc/ns-debug#debug_standardtraceblocktofile

@AlexeyAkhunov
Copy link
Contributor

That's the exact reason why geth have debug_standardtraceblocktofile. Has that been removed from turbo-geth ?

No, we just need to specifically port every new method when it is added to Geth. Because our API lives in cmd/rpcdaemon now. Will try to port for the next release, thank you for the pointer!

@AlexeyAkhunov
Copy link
Contributor

Will be fixed by this: #1880
It works in my tests, I can download result via curl And RPC daemon's memory keeps stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants