-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on go-ethereum 1.8.7 #16728
Comments
Note: geth is basically the only thing running on this box and we have made sure the memory increases in the box are all geth. The sawtooth shape in the memory graphs are geth restarts. |
We are experiencing a similar issue: Geth started in docker container on Ubuntu with:
We introduced the 8GB memory limit as work around for docker to auto restart the container. Output of memStats (time in UTC+2) |
We just downgraded to v1.8.6 with PR #16576. Will keep you posted |
+1, we are also experiencing this issue. |
Confirmed that 1.8.6 with PR #16576 also sees this issue. We're downgrading to 1.8.3 which is the last stable version we ran. @ryanschneider suspects that the memory issues were always there (at least in 1.8.x) but have been exacerbated by the recent uptick in txs. |
@vogelito Just wondering, are you using filter queries? (experimental feature) |
@mykolasmith - no we are most definitely not. |
@karalabe - I saw some memory-management related improvements on the latest v1.8.8 release |
Unfortunately not. |
@karalabe - happy to. However, I was planning on waiting a bit more to see if the problem is indeed gone in 1.8.3 We ran 1.8.6 without problems for a while until there was an uptick in network transactions. |
Another interesting experiment might be to run 1.8.7 or 1.8.8 with |
Do note however that 'scanning all objects and measuring their memory use' will probably hit the node hard for 30+ seconds. So, take care with doing this on a production system. |
Happy to do whichever one you find more useful, just let me know. (and thanks for the heads up!) |
last 24 hours running 1.8.8. Process killed by the OS:
|
We're running a few tests ourselves too now. A PR from Gary #16720 could help with the tons of goroutines by making less of them and also fixes some pending block generation issues that made tx posting slow. I can't give you meaningful numbers since the syncs just finished now, but lets see how the two behave in comparison after a day or so. After this is in, I also have an update lined up that removes a blockage from block and transaction propagation into the network. That should avoid any goroutine leak caused by slow remote peers. Will set up a benchmark for it on Monday (want to run the above PR until then). |
Thanks. I will try to provide you with a memory report of the node later today. If there's anything else I can do to help you track this problem, please let me know. |
@karalabe We're going to try this out as well. Will report back results in a few hours! |
Still seeing memory leak as of #16720. |
Per @holiman's post in #16673, I'm attaching the output of This was with 1.8.8 at over 24GB. The node had 2.3M goroutines on the event feed (99.9934% of all goroutines).
|
Master currently contains some optimizations that should help avoid one potential bottleneck from the transaction feed. Trying that might be a good data point for us. On top of master, I have a PR #16769 which removes another potential congestion points from the transaction feed. Trying out this might also be a valuable data point. No promise that these will fix the issue, but is should most definitely fix some issue that could lead to this scenario in insanely busy nodes. Quick question. Could you detail a bit your setup? How many peers do you run with? What is your API usage? What kind of requests are you serving? Anything else that might be helpful to find congested pathways? |
I see that #16769 has been merged into master. I'll try a new build off of master and report back. Regarding our setup, here are the details:
I hope this is useful. We're restarting our node now with a build from master and will report back. |
For reference, this is what we're running now
|
I've been having this issue on both 1.8.1 and 1.8.8, but it only started appearing recently (last couple of weeks I think) and has gotten to a point where I have to restart geth on a daily basis to prevent it from crashing unexpectedly. FWIW I dont think 1.8.x should be considered 'stable' until this is addressed. |
@crypt1d I don't think it's 1.8.x being unstable, rather the network usage pattern is changing and surfacing a dormant bug. |
Things are looking better but perhaps still too early to tell. Reminder:
|
Daily update: Memory still looking ok here. |
Daily update: Memory still looking ok here. |
Update: memory seems to have been stable as of #16769, however we experienced several spikes in memory usage prior to 18:00EST yesterday, after which all peers began to timeout and the node never recovered.
|
Closing this as latest code fixes it. |
@mykolasmith We may have just found the hang issue you experienced: #16840. |
System information
Geth version:
OS & Version: Linux
Expected behaviour
Memory usage of go-ethereum should be stable.
Actual behaviour
Something happens which causes memory usage to increase substantially.
Steps to reproduce the behaviour
Unknown
pprof top
This wasn't hugely useful to us, but hoping it means something to the folks in here. Happy to provide additional information. This is currently happening ~1/day
Metrics
Last month:
Last 2 months:
Additional info
For reference, this host was migrated to run:
v1.8.2 on March 5
v1.8.3 on March 27
v1.8.6 on Apr 26
v1.8.6 with PR #16576 on May 1
v1.8.7 on May 4
We run our node with the following cmd line arguments
--rpc --rpcapi "debug,personal,eth,web3" --targetgaslimit 1000000 --cache 2048 --gpopercentile 30 --maxpeers 25
On May 1st we added
--metrics --pprof --pprofaddr=127.0.0.1
and started tracking various metrics. We thought this was the source of the problem so earlier today we restarted the node without the additional flags, but unfortunately the memory leak happened again.This is the memory breakdown the last 48 hours:
And since upgrading to 1.8.7:
The text was updated successfully, but these errors were encountered: