Geth 1.8.15 - Memory Leak? #17646

zulhfreelancer · 2018-09-11T21:34:36Z

System information

Geth version:

Version: 1.8.15-stable
Git Commit: 89451f7c382ad2185987ee369f16416f89c28a7d
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/lib/go-1.10

OS & Version:

----------------------------------------------------------------------
CPU model            : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Number of cores      : 2
CPU frequency        : 2500.000 MHz
Total size of Disk   : 49.0 GB 
Total amount of Mem  : 3891 MB 
Total amount of Swap : 0 MB 
OS                   : Ubuntu 16.04.5 LTS
Arch                 : x86_64 (64 Bit)
Kernel               : 4.4.0-1067-aws
----------------------------------------------------------------------

Expected behaviour

Geth runs smoothly with normal and stable RAM usage.

Actual behaviour

It started normal around 30% RAM usage. Slowly, it jumped high until it crashed around 90% RAM usage.

Steps to reproduce the behaviour

Command:

$ /usr/bin/geth --nodiscover --syncmode 'fast' --cache=512 --rpc --rpcaddr=0.0.0.0 --rpcapi='db,eth,net,web3,personal,admin' --rpccorsdomain='*' --ws --wsaddr=0.0.0.0 --wsapi='db,eth,net,web3,personal,admin' --wsorigins='*' --mine --minerthreads='1'

FYI, I'm running 2 nodes private blockchain. Both machines are on same specs as per above. Each node has 50GB EBS volume and 4GB RAM. They are 't3.medium' type EC2 on AWS.

I didn't do anything to the node during the recording below. No extra load was sent to the node i.e. HTTP RPC call, geth attach & etc. Just mining, syncing with the second node and htop on another terminal.

I did try running the same command above in background mode and same issue happened. I noticed that Geth stopped after ~10 minutes. My SSH session stucked when it was at the peak of RAM usage.

Backtrace

Does this issue related to #16728 and #16859?

Can someone suggest the most stable version for me?

The text was updated successfully, but these errors were encountered:

pschlump · 2018-09-12T01:46:06Z

My testing indicates that 1.8.13 is stable and 1.8.15 has some sort of a problem.
I have a test system with 1.8.15 that has run as high as 45GB of memory with only a few thousand transactions. I am running 1.8.15-unstable.

zulhfreelancer · 2018-09-12T05:13:00Z

@pschlump thank you for the tip. Few questions for you:

What is the specs for that 1.8.13 node (CPU, RAM and disk)?
Do you have any guide/Gist for Geth uninstallation process?
Are you syncing main net / test net or private net?

Thanks.

pschlump · 2018-09-12T14:01:13Z

I have a private test net - the 2 machines with Geth running on it have 96GB of memory, quad Xeon - 2*2TB hard drives. They are isolated from main-net with a hardware fire wall. Purely test systems.

My process for down-grade of the test systems - I used docker to bring up 1.8.13 nodes - one on each system and let them sync. Then I just shutdown the 1.8.15-unstable. Then I brought up 2 new nodes with 1.8.13 and shutdown the docker containers.

I can confirm that the 1.8.13 version is stable - and - not leaking. When I bring up a 1.8.15 in docker it grows until I kill it.

pschlump · 2018-09-12T14:01:41Z

I have not tried 1.8.14 - I will try that today in a docker container.

pschlump · 2018-09-12T14:55:38Z

I have run our distributed key generation application (Keep/thesis*) on 1.8.13 - and geth grows by 1.1mb of memory then goes back down in a few minutes (good behavior). On 1.8.15 it grew by 2.3 GB! I am setting up a 1.8.14 version now.

pschlump · 2018-09-12T19:38:36Z

My tests indicate that 1.8.14 is ok - the problem is with .15.

pschlump · 2018-09-12T23:24:44Z

On a test-network? Behind a firewall? Why?

…

On Wed, Sep 12, 2018 at 5:03 PM a e r t h ***@***.***> wrote: --rpc --rpcaddr=0.0.0.0 --rpcapi='db,eth,net,web3,personal,admin' should be illegal or something — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17646 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAhMQZB5kZ9dO7zNN9KozLKeOgvbnJHBks5uaZLMgaJpZM4WkRPX> .

-- Philip Schlump

zulhfreelancer · 2018-09-13T05:53:07Z

Thank you @pschlump for the pointers. I will give it a try.

holiman · 2018-09-14T08:26:38Z

On those logfiles, the first one had 10 simultaneous Unlock operations going on, and the second has 7. The mem on the machine is 3891 MB. I have run into issues on unlock a single key on a usb armory, which as 500Mb. So I would suspect it's the decryption going on that's causing it to crash.

You could try using --lightkdf settings for the keystores, that will make it take a lot less memory. See

go-ethereum/accounts/keystore/keystore_passphrase.go

Line 55 in d9575e9

    
           // StandardScryptP is the P parameter of Scrypt encryption algorithm, using 256MB

and

go-ethereum/cmd/utils/flags.go

Line 181 in 0e32989

LightKDFFlag = cli.BoolFlag{

I have no idea why it would differ between versions though. But I guarantee that Unlock takes hundreds of MB of memory even on the older builds, but maybe for some reason it finished them faster and they didn't pile up to become paralell, which causes the crash

holiman · 2018-09-14T08:27:59Z

Oh, and if it wasn't you calling Unlock, then it's some attacker spuriously trying to do some bruteforce password guessing against your node.

holiman · 2018-09-14T08:31:12Z

--rpc --rpcaddr=0.0.0.0 --rpcapi='db,eth,net,web3,personal,admin' should be illegal or something

On a test-network? Behind a firewall? Why?

I guess your firewall is not properly configured, so this ticket demonstrates a pretty good reason :)

pschlump · 2018-09-14T12:08:10Z

I think I was the source of the unlocks on my system. I have looked in my logs from my firewall and I see no evidence that any unexpected outside activity took place. I am now looking into the possibility that somebody unwanted has penetrated our security and has malicious code running inside our firewall. I don't see any unexpected pending transactions and I am monitoring once a second for pending transactions. I take your comment very seriously.

francofs · 2018-10-01T15:14:00Z

I managed to resolve this problem by disabling the 'db' rpc API. Not sure if it's the same root cause as you guys, but the behavior seems similar to mine.

bencagri · 2018-10-23T13:38:46Z

Im running on 1.8.17-stable, have same environment and same problem. I dont have db in rpc. Running --syncmode "light" --rinkeby

edit: I downgraded geth to 1.8.14 as @pschlump mentioned. I see no problem after synced. 👍

holiman · 2018-11-19T12:28:28Z

This is already solved, I'm closing

passionofvc mentioned this issue Sep 19, 2018

debug.traceTransaction cause geth run in 99% RAM usage when geth in syncing? #17394

Closed

sindelio mentioned this issue Sep 23, 2018

Swap file to solve Geth Out-Of-Memory issues ClearGDPR/ClearGDPR#456

Open

adamschmideg added the status:triage label Nov 19, 2018

holiman closed this as completed Nov 19, 2018

fjl removed status:triage labels Jan 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geth 1.8.15 - Memory Leak? #17646

Geth 1.8.15 - Memory Leak? #17646

zulhfreelancer commented Sep 11, 2018 •

edited

Loading

pschlump commented Sep 12, 2018 •

edited

Loading

zulhfreelancer commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018 via email

zulhfreelancer commented Sep 13, 2018

holiman commented Sep 14, 2018

holiman commented Sep 14, 2018

holiman commented Sep 14, 2018

pschlump commented Sep 14, 2018

francofs commented Oct 1, 2018

bencagri commented Oct 23, 2018 •

edited

Loading

holiman commented Nov 19, 2018

Geth 1.8.15 - Memory Leak? #17646

Geth 1.8.15 - Memory Leak? #17646

Comments

zulhfreelancer commented Sep 11, 2018 • edited Loading

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Backtrace

pschlump commented Sep 12, 2018 • edited Loading

zulhfreelancer commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018

pschlump commented Sep 12, 2018 via email

zulhfreelancer commented Sep 13, 2018

holiman commented Sep 14, 2018

holiman commented Sep 14, 2018

holiman commented Sep 14, 2018

pschlump commented Sep 14, 2018

francofs commented Oct 1, 2018

bencagri commented Oct 23, 2018 • edited Loading

holiman commented Nov 19, 2018

zulhfreelancer commented Sep 11, 2018 •

edited

Loading

pschlump commented Sep 12, 2018 •

edited

Loading

bencagri commented Oct 23, 2018 •

edited

Loading