Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geth 1.8.15 - Memory Leak? #17646

Closed
zulhfreelancer opened this issue Sep 11, 2018 · 15 comments
Closed

Geth 1.8.15 - Memory Leak? #17646

zulhfreelancer opened this issue Sep 11, 2018 · 15 comments

Comments

@zulhfreelancer
Copy link

zulhfreelancer commented Sep 11, 2018

System information

Geth version:

Version: 1.8.15-stable
Git Commit: 89451f7c382ad2185987ee369f16416f89c28a7d
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/lib/go-1.10

OS & Version:

----------------------------------------------------------------------
CPU model            : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Number of cores      : 2
CPU frequency        : 2500.000 MHz
Total size of Disk   : 49.0 GB 
Total amount of Mem  : 3891 MB 
Total amount of Swap : 0 MB 
OS                   : Ubuntu 16.04.5 LTS
Arch                 : x86_64 (64 Bit)
Kernel               : 4.4.0-1067-aws
----------------------------------------------------------------------

Expected behaviour

Geth runs smoothly with normal and stable RAM usage.

Actual behaviour

It started normal around 30% RAM usage. Slowly, it jumped high until it crashed around 90% RAM usage.

Steps to reproduce the behaviour

Command:

$ /usr/bin/geth --nodiscover --syncmode 'fast' --cache=512 --rpc --rpcaddr=0.0.0.0 --rpcapi='db,eth,net,web3,personal,admin' --rpccorsdomain='*' --ws --wsaddr=0.0.0.0 --wsapi='db,eth,net,web3,personal,admin' --wsorigins='*' --mine --minerthreads='1'

FYI, I'm running 2 nodes private blockchain. Both machines are on same specs as per above. Each node has 50GB EBS volume and 4GB RAM. They are 't3.medium' type EC2 on AWS.

I didn't do anything to the node during the recording below. No extra load was sent to the node i.e. HTTP RPC call, geth attach & etc. Just mining, syncing with the second node and htop on another terminal.

I did try running the same command above in background mode and same issue happened. I noticed that Geth stopped after ~10 minutes. My SSH session stucked when it was at the peak of RAM usage.

Backtrace

Does this issue related to #16728 and #16859?

Can someone suggest the most stable version for me?

@pschlump
Copy link
Contributor

pschlump commented Sep 12, 2018

My testing indicates that 1.8.13 is stable and 1.8.15 has some sort of a problem.
I have a test system with 1.8.15 that has run as high as 45GB of memory with only a few thousand transactions. I am running 1.8.15-unstable.

@zulhfreelancer
Copy link
Author

@pschlump thank you for the tip. Few questions for you:

  1. What is the specs for that 1.8.13 node (CPU, RAM and disk)?
  2. Do you have any guide/Gist for Geth uninstallation process?
  3. Are you syncing main net / test net or private net?

Thanks.

@pschlump
Copy link
Contributor

I have a private test net - the 2 machines with Geth running on it have 96GB of memory, quad Xeon - 2*2TB hard drives. They are isolated from main-net with a hardware fire wall. Purely test systems.

My process for down-grade of the test systems - I used docker to bring up 1.8.13 nodes - one on each system and let them sync. Then I just shutdown the 1.8.15-unstable. Then I brought up 2 new nodes with 1.8.13 and shutdown the docker containers.

I can confirm that the 1.8.13 version is stable - and - not leaking. When I bring up a 1.8.15 in docker it grows until I kill it.

@pschlump
Copy link
Contributor

I have not tried 1.8.14 - I will try that today in a docker container.

@pschlump
Copy link
Contributor

I have run our distributed key generation application (Keep/thesis*) on 1.8.13 - and geth grows by 1.1mb of memory then goes back down in a few minutes (good behavior). On 1.8.15 it grew by 2.3 GB! I am setting up a 1.8.14 version now.

@pschlump
Copy link
Contributor

My tests indicate that 1.8.14 is ok - the problem is with .15.

@pschlump
Copy link
Contributor

pschlump commented Sep 12, 2018 via email

@zulhfreelancer
Copy link
Author

Thank you @pschlump for the pointers. I will give it a try.

@holiman
Copy link
Contributor

holiman commented Sep 14, 2018

On those logfiles, the first one had 10 simultaneous Unlock operations going on, and the second has 7. The mem on the machine is 3891 MB. I have run into issues on unlock a single key on a usb armory, which as 500Mb. So I would suspect it's the decryption going on that's causing it to crash.

You could try using --lightkdf settings for the keystores, that will make it take a lot less memory. See

// StandardScryptP is the P parameter of Scrypt encryption algorithm, using 256MB
and
LightKDFFlag = cli.BoolFlag{

I have no idea why it would differ between versions though. But I guarantee that Unlock takes hundreds of MB of memory even on the older builds, but maybe for some reason it finished them faster and they didn't pile up to become paralell, which causes the crash

@holiman
Copy link
Contributor

holiman commented Sep 14, 2018

Oh, and if it wasn't you calling Unlock, then it's some attacker spuriously trying to do some bruteforce password guessing against your node.

@holiman
Copy link
Contributor

holiman commented Sep 14, 2018

--rpc --rpcaddr=0.0.0.0 --rpcapi='db,eth,net,web3,personal,admin' should be illegal or something

On a test-network? Behind a firewall? Why?

I guess your firewall is not properly configured, so this ticket demonstrates a pretty good reason :)

@pschlump
Copy link
Contributor

I think I was the source of the unlocks on my system. I have looked in my logs from my firewall and I see no evidence that any unexpected outside activity took place. I am now looking into the possibility that somebody unwanted has penetrated our security and has malicious code running inside our firewall. I don't see any unexpected pending transactions and I am monitoring once a second for pending transactions. I take your comment very seriously.

@francofs
Copy link

francofs commented Oct 1, 2018

I managed to resolve this problem by disabling the 'db' rpc API. Not sure if it's the same root cause as you guys, but the behavior seems similar to mine.

@bencagri
Copy link

bencagri commented Oct 23, 2018

Im running on 1.8.17-stable, have same environment and same problem. I dont have db in rpc. Running --syncmode "light" --rinkeby

edit: I downgraded geth to 1.8.14 as @pschlump mentioned. I see no problem after synced. 👍

@holiman
Copy link
Contributor

holiman commented Nov 19, 2018

This is already solved, I'm closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@fjl @holiman @adamschmideg @pschlump @bencagri @zulhfreelancer @francofs and others