Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geth 1.8.6 sync fails to complete #16603

Closed
onepremise opened this issue Apr 29, 2018 · 23 comments
Closed

geth 1.8.6 sync fails to complete #16603

onepremise opened this issue Apr 29, 2018 · 23 comments

Comments

@onepremise
Copy link

onepremise commented Apr 29, 2018

I've tried several clean resyncs, removed the chain database, in attempt to test the new release of geth, 1.8.6. All attempts result in the sync stalling at the very end, about 100-79 blocks from the end:

using command-line: geth --cache=1024 --syncmode "fast"

eth.syncing
{
currentBlock: 5516185,
highestBlock: 5516264,
knownStates: 72045381,
pulledStates: 72029865,
startingBlock: 5516011
}

again 15 min later

eth.syncing
{
currentBlock: 5516254,
highestBlock: 5516326,
knownStates: 74102284,
pulledStates: 74087290,
startingBlock: 5516011
}

I never reach false. This process above has been running for 3 days.

System information

Geth version: 1.8.6-stable-12683fec
OS & Version: OSX High Sierra 10.13.3
Ethereum wallet: 0.10.0

Expected behaviour

sync should complete and final balance should show in wallet

Actual behaviour

sync never completes, wallet never shows balance

Steps to reproduce the behaviour

brew install latest geth, 1.8.6

There are several open tickets, some even list more tickets, for this same issue on other related projects. but for reference:

Ethereum wallet will not sync past the last 65 blocks
ethereum/mist#3738

Ethereum Wallet doesn't sync
ethereum/mist#3836

Issues related with Syncing
ethereum/mist#3097

After syncing, Mist doesn't update info (but stays syncd)
ethereum/mist#3837

Is it possible this change got left out of the latest release?

add 'ps.lock.Unlock()' before return #16360
#16360

Going back to the geth release, 1.8.2, packaged with the wallet 0.10.0, removed chain data, the full sync, not even with the fast mode setting, completes in a few hours. There may be a regression in the 1.8.6 release, or least it may warrant to look into the issue. Let me know if you need any further info, or would like me to try something else out. THanks.

@jmozah
Copy link
Contributor

jmozah commented Apr 30, 2018

yes. I face the same issue too. I have been running for 5 days now. The sync is close but not over.

> eth.syncing
{
  currentBlock: 5532100,
  highestBlock: 5532178,
  knownStates: 81786484,
  pulledStates: 81783885,
  startingBlock: 5531829
}

Geth
Version: 1.8.6-stable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.9.2
Operating System: linux
GOPATH=/home/ubuntu/go
GOROOT=/usr/local/go

Sync fail because of dropped peer...
I tried adding close to 100 peers over 5 days from ethernodes.org and the peer count always comes to 2 or 3 after few minutes.

INFO [04-30|18:07:24] Regenerated local transaction journal    transactions=0 accounts=0
INFO [04-30|18:07:33] Imported new block headers               count=1    elapsed=26.440ms  number=5532115 hash=8d0ee4…35d30f ignored=0
INFO [04-30|18:07:46] Imported new block headers               count=1    elapsed=26.342ms  number=5532116 hash=af6a53…a3091c ignored=0
INFO [04-30|18:07:52] Imported new block headers               count=1    elapsed=27.388ms  number=5532117 hash=b604c7…a37e21 ignored=0
INFO [04-30|18:08:00] Imported new state entries               count=1475 elapsed=6.205ms   processed=81774745 pending=6254 retry=0  duplicate=0 unexpected=61
INFO [04-30|18:08:13] Imported new block headers               count=1    elapsed=22.278ms  number=5532118 hash=275f17…b60b95 ignored=0
INFO [04-30|18:08:21] Imported new state entries               count=1528 elapsed=9.912ms   processed=81776273 pending=5025 retry=24 duplicate=0 unexpected=61
INFO [04-30|18:08:28] Imported new block headers               count=1    elapsed=23.825ms  number=5532119 hash=fc5616…02161f ignored=0
INFO [04-30|18:08:39] Imported new block headers               count=1    elapsed=22.802ms  number=5532120 hash=af3075…d6d301 ignored=0
INFO [04-30|18:08:48] Imported new state entries               count=1179 elapsed=7.204ms   processed=81777452 pending=3920 retry=0  duplicate=0 unexpected=61
INFO [04-30|18:08:52] Imported new block headers               count=1    elapsed=26.005ms  number=5532121 hash=1cdb7a…996dd8 ignored=0
INFO [04-30|18:09:02] Imported new block headers               count=1    elapsed=27.043ms  number=5532122 hash=a41607…5ef8dc ignored=0
INFO [04-30|18:09:36] Imported new state entries               count=443  elapsed=1.992ms   processed=81777895 pending=5450 retry=0  duplicate=0 unexpected=61
WARN [04-30|18:10:18] Rolled back headers                      count=109  header=5532122->5532013 fast=5532020->5532013 block=0->0
INFO [04-30|18:10:28] Imported new state entries               count=384  elapsed=2.287ms   processed=81778279 pending=6191 retry=0  duplicate=0 unexpected=61
WARN [04-30|18:10:28] Synchronisation failed, dropping peer    peer=7869970b968f9b25 err="retrieved hash chain is invalid"
INFO [04-30|18:10:37] Imported new block headers               count=38   elapsed=653.925ms number=5532159 hash=a7c513…d410d8 ignored=108
INFO [04-30|18:10:37] Imported new block receipts              count=0    elapsed=1.200ms   number=5532015 hash=9a23fe…3523dc size=0.00B    ignored=2
INFO [04-30|18:10:38] Imported new block receipts              count=0    elapsed=1.206ms   number=5532017 hash=526b7d…274386 size=0.00B    ignored=2
INFO [04-30|18:10:38] Imported new block receipts              count=24   elapsed=187.425ms number=5532044 hash=b6a71c…471ce7 size=2.48mB   ignored=3
INFO [04-30|18:10:43] Imported new block receipts              count=48   elapsed=348.205ms number=5532092 hash=07fa18…172fb6 size=4.81mB   ignored=0
INFO [04-30|18:10:56] Imported new state entries               count=0    elapsed=334.027µs processed=81778279 pending=1    retry=1  duplicate=0 unexpected=117
INFO [04-30|18:10:58] Imported new block headers               count=1    elapsed=22.653ms  number=5532160 hash=deb7b8…c689d2 ignored=0
INFO [04-30|18:11:13] Imported new block headers               count=1    elapsed=21.921ms  number=5532161 hash=4ea954…185ebb ignored=0
WARN [04-30|18:11:22] Rolled back headers                      count=40   header=5532161->5532121 fast=5532092->5532092 block=0->0
WARN [04-30|18:11:40] Synchronisation failed, dropping peer    peer=24f4fcfd84c6200a err="retrieved hash chain is invalid"
INFO [04-30|18:11:46] Imported new block headers               count=4    elapsed=89.321ms  number=5532164 hash=13b090…cfe468 ignored=68
INFO [04-30|18:11:47] Imported new block receipts              count=7    elapsed=55.955ms  number=5532099 hash=80af97…c62d82 size=756.37kB ignored=0
INFO [04-30|18:12:25] Imported new state entries               count=233  elapsed=1.240ms   processed=81778512 pending=2388 retry=0  duplicate=0 unexpected=117
WARN [04-30|18:12:29] Rolled back headers                      count=4    header=5532164->5532160 fast=5532099->5532099 block=0->0
INFO [04-30|18:12:29] Imported new state entries               count=394  elapsed=2.442ms   processed=81778906 pending=2466 retry=0  duplicate=0 unexpected=117
WARN [04-30|18:12:29] Synchronisation failed, dropping peer    peer=24f4fcfd84c6200a err="retrieved hash chain is invalid"
INFO [04-30|18:12:33] Imported new block headers               count=3    elapsed=60.214ms  number=5532166 hash=2d1580…201c04 ignored=64
INFO [04-30|18:12:33] Imported new block receipts              count=1    elapsed=9.668ms   number=5532100 hash=120851…b1c0cd size=125.55kB ignored=0
INFO [04-30|18:12:49] Imported new block headers               count=1    elapsed=26.097ms  number=5532167 hash=60a582…e586d0 ignored=0
INFO [04-30|18:13:15] Imported new state entries               count=253  elapsed=1.258ms   processed=81779159 pending=2208 retry=0  duplicate=0 unexpected=117
INFO [04-30|18:13:28] Imported new block headers               count=1    elapsed=30.884ms  number=5532168 hash=ede494…d55649 ignored=0
INFO [04-30|18:13:31] Imported new block headers               count=1    elapsed=27.889ms  number=5532169 hash=4165c9…2b8e56 ignored=0
INFO [04-30|18:13:41] Imported new block headers               count=1    elapsed=22.428ms  number=5532170 hash=a64aaa…45a09c ignored=0
INFO [04-30|18:13:51] Imported new block headers               count=1    elapsed=24.445ms  number=5532171 hash=c45379…8d18df ignored=0
INFO [04-30|18:13:59] Imported new state entries               count=598  elapsed=2.946ms   processed=81779757 pending=2716 retry=9  duplicate=0 unexpected=117
INFO [04-30|18:14:01] Imported new block headers               count=1    elapsed=30.124ms  number=5532172 hash=18a96c…b4afba ignored=0
INFO [04-30|18:14:13] Imported new block headers               count=1    elapsed=21.910ms  number=5532173 hash=3b6f1b…fca6ba ignored=0

@jmozah
Copy link
Contributor

jmozah commented May 2, 2018

I still have the same issue... the sync is not catching up.. always ~100 blocks behind.

{
  currentBlock: 5541929,
  highestBlock: 5542003,
  knownStates: 34618575,
  pulledStates: 34604275,
  startingBlock: 0
}

@LiorRabin
Copy link

Same here...

@quasisamurai
Copy link

Mate, my geth is syncing about 35 days till start. Now i have about 158 million pulled states and it is not ended yet. 💩
So I started it on 3 PCs, and state is almost the same.
Dunno how to get actual livenet.

@veox
Copy link
Contributor

veox commented May 7, 2018

This is expected behaviour, and not an issue.

For an explanation of what is happening, along with a small FAQ, see this comment.


There are several open tickets, some even list more tickets, for this same issue on other related projects.

There are also many identical issues open on this bug tracker, too. ;)


Suggest closing as not a bug.

@ghost
Copy link

ghost commented May 8, 2018

Hi, all. If I correctly understood, in the end it's syncing 'states' above all, not 'blocks'. Because 'blocks' almost synced. (1.8.6-stable-12683fec). Geth is syncing about 7 days. (ubuntu 16, 4vCPU and 16GbRam)
{
currentBlock: 5578049,
highestBlock: 5578116,
knownStates: 144052459,
pulledStates: 144052458,
startingBlock: 0
}
I have the same logs as @jmozah commented early.
Maybe somebody know: how many there are states in chain? Or where I can find this number.

@veox
Copy link
Contributor

veox commented May 13, 2018

Maybe somebody know: how many there are states in chain? Or where I can find this number.

@dpredkel A node that's synced today shows me 135141789 states right before it's done. I think there was an issue opened recently for the second question, but I can't find it.

Your knownStates number is higher. I'd guess that after so many days of syncing, many of those states are already stale, but not cleared.

When the node seems to stall on initial sync, I'd generally recommend restarting the node - without shredding the database, of course! - so it can set a newer block as a pivot.


The machine that I did the sync on has less memory than yours, but very good network connectivity (no-NAT IPv4, 1 GiB/s), and more-or-less dedicated SSD storage.

From start to finish, the sync took ~25 hours (with no restarts). Same v1.8.6-stable-12683fec (from Ubuntu PPA).

@ghost
Copy link

ghost commented May 14, 2018

@veox Thanks for answer. Can you say how to correctly restart node without shredding the database?

@veox
Copy link
Contributor

veox commented May 14, 2018

@dpredkel What I meant is don't do a removedb. :)

@ghost
Copy link

ghost commented May 14, 2018

@veox Oh, ok, I will just restart client. Thanks a lot :)

@jebek29
Copy link

jebek29 commented May 14, 2018

Markdown_1.0.1.zip

@CryptoKiddies
Copy link

@veox hello again, I am again facing the Synchronisation failed, dropping peer issue yet again. I had 3 weeks of stability when I rolled back to v1.8.3. However starting yesterday, I am having the same issue where the geth client decides to fall out of sync for long periods of time. Occasionally, it will resync, but is out of sync 90% of the time.

Do you have any thoughts on how to prevent this issue? I have 16GB RAM, 4 cores, and 0.5GiB/s IOPS on Ubuntu 16.04. Does restarting really help? I have a suspicion that some stochastic process causes corruption of the data and prevents the node from behaving properly. I was hoping after 3 weeks, that this would be a stable Geth version.... thanks for your help!

@veox
Copy link
Contributor

veox commented May 17, 2018

@GeeeCoin I think you meant to comment in #16539. :)

EDIT: In short, though: no, I have no useful thoughts on that. I'm experiencing something similar on a high-load node, but I expect it's being dropped by peers due to its high message round-trip time.

@daggerhashimoto
Copy link

daggerhashimoto commented May 17, 2018

Same here, constantly around 90 blocks behind the actual chain. I've tried with higher server specs but to no avail.

@CryptoKiddies
Copy link

CryptoKiddies commented May 17, 2018

@veox ahh I remember that comment well:) Still seems to be unresolved. Have you had success with static-nodes or addmin.addPeer? I've tried both approaches numerous times, but the requested peers don't show up in my peer list. I grabbed fairly recent peers from the eth network site.

I think a good contingency would to use Infura as a backup switch. The only problem is how to incorporate my geth wallet with an Infura provider without having to recode my project with the extracted private key (which leads to low level calls required in web3). There's a new library similar to HDWalletProvider, but for private key as opposed to pnemonic https://github.com/rhlsthrm/truffle-hdwallet-provider-privkey. Looks promising, any thoughts? ...Or perhaps Infura has a public enode ID they're willing to share;)

@ghost
Copy link

ghost commented May 23, 2018

So did anybody end up resolving this ? Im facing the same issue too. Using geth v1.8.3-stable. Have plenty of system resources available and using an SSD with only 400/3000 iops used. Sync to ropsten works fine but neither fast sync nor full sync to mainnet works. Still at last 100 blocks for the past 2 days. Is there any point to continue running it n it will sync or do we need to look into another method ? Could this be resolved by running an older / different version of geth ? Thanks

@daggerhashimoto
Copy link

@troowala if you're working on something, you can continue working on Infura until this gets resolved.

@ghost
Copy link

ghost commented May 24, 2018

@CD0x23 i wish it was an option. it's not just publishing contracts from remix.

the application we use requires communicating with a geth node which has the wallet file residing in the geth node and the application unlocks the file and then executes a multitude of contracts and scripts. So if we want to go the infura route we will have to significantly modify this third party application in an unspported manner to accomodate transferring private key to infura or myetherapi nodes, hence need the geth node to sync with the mainnet.

So are you involved with this issue resolution by any chance ? and does that mean no newly provisioned geth nodes are syncing with the mainnet ? thanks

@ghost
Copy link

ghost commented May 29, 2018

Okay solution is to ensure that the SSD has atleast 1000 IOPS or more.

In case of AWS EC2 instances the cheapest solution is to use GP2 volumes that are 350GB or above as amazon gives you 3 IOPS per GB.

Smaller EBS volumes will not sync as they do not have enough IOPS unless you choose IO1 Provisioned IOPS and set it to 1000 IOPS or more which is a lot more expensive way of achieving the same end result.

@veox
Copy link
Contributor

veox commented May 29, 2018

@troowala That number might be AWS-specific. I'm I've been re-syncing monthly within half a day on a VPS with 350 IOPS (according to the provider's spec sheet).

@ghost
Copy link

ghost commented May 29, 2018

@veox Darn ! i feel a little ripped off if AWS is hustling like that ...

Changing Disk IOPS is the only configuration change i had to make to get the eth mainnet to sync. If you dont mind posting your disk speeds and geth startup parameters that would be a helpful insight.

Mine for the blockchain storage volume are

Read Speeds
$ sudo hdparm -Tt /dev/xvdf

/dev/xvdf:
Timing cached reads: 16330 MB in 1.99 seconds = 8192.66 MB/sec
Timing buffered disk reads: 234 MB in 3.03 seconds = 77.23 MB/sec

Write Speeds

$ sync; dd if=/dev/zero of=/store/testspeed bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 14.4353 s, 74.4 MB/s

Geth Startup parameters

./geth --datadir=/store/mainnet --rpcapi "personal,web3,eth,net,db,debug" --rpc --rpcaddr "0.0.0.0" --rpccorsdomain "10.10.0.0/16,172.21.0.0/24"

Thanks

@veox
Copy link
Contributor

veox commented May 30, 2018

@troowala Actually, the provider's spec sheet no longer shows "IOPS" anywhere. :/ I've also re-purposed the "350 IOPS" machine to Ropsten earlier this month, and downgraded the plan, so even if I took the measurements, they'd be for something else.

The best I can do is show these graphs from the provider's panel:

graph php

graph php

Since the start of December 2017 'till mid-May 2018, it's been running with some variant of:

/usr/bin/geth --syncmode fast --datadir /home/geth/.ethereum --cache 2048 --txpool.globalslots 65535 --lightserv 50 --lightpeers 1000 --maxpeers 1025

The peaks on the "write" side represent the occasions of removedb followed by a re-sync.

Note that most of the read activity is from serving light peers. It's very unrepresentative regarding a do-nothing geth node.


A different (KVM-virtualised) machine I'm currently running seems to measure ~ 800 IOPS with iostat, and performs no better than the other one.

@fjl
Copy link
Contributor

fjl commented Feb 19, 2019

Closing this because it's an old issue. Many improvements to blockchain sync have been implemented since this issue was opened. If you are still having sync issues, please open a new issue.

@fjl fjl closed this as completed Feb 19, 2019
@fjl fjl removed the status:triage label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

12 participants