Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth.getBlock("latest").number is always 0 #16147

Closed
danmerino opened this issue Feb 21, 2018 · 26 comments
Closed

eth.getBlock("latest").number is always 0 #16147

danmerino opened this issue Feb 21, 2018 · 26 comments

Comments

@danmerino
Copy link

Geth version: 1.8.1
OS & Version: Docker

Expected behaviour

eth.syncing
false
eth.getBlock("latest").number
5128088

Actual behaviour

eth.syncing
false
eth.getBlock("latest").number
0

Steps to reproduce the behaviour

A fresh docker container without any data. Several people are having this problem under different platforms:

Backtrace

No errors

@danellis
Copy link

I have the same problem. --syncmode fast --cache 2048. When eth.syncing is false, both eth.blockNumber and eth.getBlock('latest').number are 0.

@karalabe
Copy link
Member

Do you see logs that Geth is importing data from the network? If yes, you need to wait until sync finishes. Some people reported that eth.syncing is sometimes flaky, I'll need to take a look at why that might be.

@dotneet
Copy link

dotneet commented Feb 22, 2018

I had the same problem when connecting ropsten testnet.
I've resolved this problem by changing boot nodes.

@focyde
Copy link

focyde commented Feb 22, 2018

most of these problems is occur when the sync not finish or network unconnected.

@danmerino
Copy link
Author

All logs appear to say that is its fully in sync. Just to make sure I was not crazy I left the node running for a few days. Still no luck. I didn't experience this on 1.7

I am not able to query the latest block by number so I do agree that it might be eth.syncing being flaky. @karalabe

Lets close this ticket and I will dig into the code because I know a lot of people are having problems believing that their node is in sync when its probably not?

@KevinIsSun
Copy link

I have same problem, please reopen this issue. Although the log appears its fully in sync, I can only get the block about 100 behind from latest block.

@ElectricBill
Copy link

ElectricBill commented Mar 9, 2018

I've got this issue lately. I used fast sync to catch up, then start running in full sync mode. But the latest block number never budges, even though eth.syncing indicates that it has run almost to the current time. Worse, geth (1.8.2) crashes repeatedly with indications of data block corruption, but restarts OK and runs past the failure point, only to crash again on the same block.

I speculate that it wrote corrupt data while doing the fast sync, and cannot revert to full sync due to the corruption. But on each restart, it accumulates more current blocks before attempting the full sync verification process.

Now I am trying to start clean with only a full sync while taking periodic ZFS snapshots for rollback in case of corruption encountered. I hope that gets me to a point where I can do transactions with my ether. Very frustrating.

@karalabe
Copy link
Member

karalabe commented Mar 9, 2018

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.


Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

Also have this issue. geth is running with parameters "--cache=1024 --fast --rpc" and is syncing.
Re-installed and/or cleared chaindata a few times.
Attached console using geth attach http://127.0.0.1:8545
(also tried to give direct path of "datadir" since I have hardlinked to different drive. This does not change anything.
if synced or not, eth.getBlock("latest"), eth.getBlock("earliest") and eth.getBlock("pending") all return block 0 (not only number filed trashed).

Btw. SSD ist NOT recommended because of IO-Traffic. It would not work very long. Better increase the cache and it works fine from HDD

@karalabe
Copy link
Member

Please read my above comment.

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

I did and you are wrong about SSD. And this is no sync issue.
(just looking for the article where it was explained)

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

Hahaha - it is great :-)
"use SDD" - "use NO SSD" (I link to the Thread)
In fact I also hat good success using one but the high access-right might destroy it very early.
Also had good luck by increasing the cache
ethereum/mist#2595

@karalabe
Copy link
Member

if synced or not, eth.getBlock("latest"), eth.getBlock("earliest") and eth.getBlock("pending") all return block 0

That's because it's not yet synced, it's still downloading the state trie, as I explained above.

Re-installed and/or cleared chaindata a few times.

Clearing the chaindata just makes it start over, why would you do that?

SSD ist NOT recommended because of IO-Traffic

Yes, unfortunately the IO traffic is very high. However exactly because of that, an HDD will take weeks to sync.

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

"clear chaindata" also was one of that ideas I've read here. Sometimes it simply hangs and sometimes it seem to clear it by itself.
When it says "synced" the size of "chaindata" is different between about 50GB and 160GB.
Currently at 4GB... (still syncing of course - log looks nice). Do it without wallet but running geth in console only).
looks like there is some strange shit going on.
In fact I have enough peers - that's not the point (no provider issue. Other swarm-based stuff like IPFS runs fine). Internet is slow sometimes (5 - 25 MBit) here countryside. Slower but should work anyway.

Btw. I use BTC from the beginning without any issues. Tried about 50 other wallets with strange stuff from "AsicCoin" trhough "Mazacoin" to "ZetaCoin" but never ran int that trouble.

Geth looks like quite solid stuff and should do its job. I like the idea and do not want to give up.

the getBlock-Issue did not change when it was synced. But maybe chaindata trashed? Will try again ....

@tyramisoux
Copy link

the --cache=1024 seem to do the trick nicely. Checked I/O and Disc-access (using procexp) which is as "normal". No high load while geth is running!

@karalabe
Copy link
Member

--cache=1024 is the default since Geth 1.8.0.

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

hm, so I guess it also is in my 1.8.2. But thats all fine here. Even would let this run on SSD without concern but looks like there is no need. HDD bored.

if the "latest" works on synced chain only I guess it never was really synced!
the "eth.syncing.highestBlock" or even getBlock(eth.syncing.currentBlock) works great.

in fact the "eth.getBlock(eth.syncing.highestBlock)" also return s "null". Means it loads the block from downloaded and does not trigger acquire it from network.

getBlock("latest") should return an error and not "Block 0". This looks wrong to me an any case (it neither is the "latest" nor the "latest downloaded".
So it is necessary to check if synced before using this

@ElectricBill
Copy link

Thanks for that thorough write-up, karalabe. I'm going to have to take some time to read it carefully. I am currently running a full sync with SSD with periodic ZFS snapshots for rollback in case of failure. The log is littered with periodic reports of checksum mismatches, broken chain ancestry, synchronization failures, canceled downloads, dangling trie nodes, empty head header sets, actions from "bad peers" that are ignored, receipt downloads canceled, and probably many more messages of which I am unsure of the precise meaning, provenance or severity. I don't know what is merely a warning that I can safely ignore and what is a problem that should raise alarm. Very simply, the reports don't provide me with strong assurance that it is functioning properly and that I may trust the integrity of its operation accordingly. Compounding that problem are the periodic crashes that also shake my confidence. I am craving unambiguous reports from the process that indicate in clear simple language that it is getting on with the process and making progress, having verified cryptographic integrity, regardless of whether it can tell me how long it has left to run.

Beyond that, if I may, could you indulge me by possibly answering the following question succinctly?

What is the minimum requirement for me to operate geth in such a way that it will show my account balances and enable me to perform transactions with my ethereum ledger balances? Is --light operation sufficient? If such is the case, what are the risks?

@tyramisoux
Copy link

tyramisoux commented Mar 13, 2018

Runs quite smooth today with the usual errors (not too much)

eth.syncing
{
currentBlock: 3175461,
highestBlock: 5249624,
knownStates: 6376435,
pulledStates: 6362365,
startingBlock: 2993542 (yes I aborted an restarted for test. Continues smooth)
}

I/O and Disk-Access is very low. so @karalabe is right and this stuff is NOT an SSD-Killer as described by several references.
This should even run with an old Floppy-Disc! I have cache set to 2048 (not 1024 - was wrong, above).
It really might depend on cache so with enough cache is even does not matter if run on SSD or HDD.

Unfortunately it does not seem to use all resources I have. Network-Traffic is about 40 to 60KB/s and Peers are between 1 and maximum 5 (IPFS or Bittorrent is able connecting to >100).
current "geth 1.8.2" binary does not have admin, debug and other stuff enabled. Will try to build my own version from github.

@cies
Copy link

cies commented Mar 21, 2018

@karalabe I think this is a bug. To a user/developer "syncing" is not only block syncing, but also trie (states) syncing. When eth.syncing returns false it should be done syncing, all syncing. Especially since the output of eth.syncing, when it returns a non-false value, also shows the knowStates and pulledStates values, thereby suggesting that trie (states) syncing is also part of it.

Currently it eth.syncing returning false means you should try if a non-zero balance is actually non-zero, and if that is not the case you can use the logs to see if there are still Imported new state entries messages, which means the trie is still syncing. You cannot even use eth.syncing to see the absolute progress, because that now returns false. Also the progress hard to interpret as --as far as I know-- the total number of states is not shown. This maybe worth a feature request, because having the rough total number of states allows one to show a progress as a percentage.

I would opt for filing a bug specific for eth.syncing's misleading behaviour, so that it can be fixed (there are MANY questions on this topic, obviously leading to much user frustration, and it is not often well explained). In the meantime we can explain some workaround for this by tailing the logs for Imported new state entries messages, and waiting at least 3x longer then the block sync to complete for the trie to be synced as well.

@cies
Copy link

cies commented Mar 21, 2018

Here's a comment explaining how long each step took on what type of hardware:

#14647 (comment)

Apparently there is another step in the syncing process. First blocks, then states, then chain segments (taking longest by far).

@tyramisoux
Copy link

tyramisoux commented Mar 21, 2018

Interesting thread. Thanks! 220GB.. WOW!
Yes, looks the the eth.syncing is really confusing. a "false" is no guarantee at all it is synced. It just indicates there are new state entries to import.

Will post result later: Currently defrag the partition. I will add a dedicated partition for the chaindata.
Yes it is still importing states.
My chaindata is at abut 70GB and increase very slow now in size. So seem there is some "day" to go...
One time I had more than 150GB but I've deleted all due to a recommendation concerning the subject.

Another question is: How many peers are you all connected to? What is the typical amount of peers? I used to have up to about 10 peers (after resyncing system-time on a regular basis using NTP client as a scheduled service) but it became less now with 0 to 3 (using net.peercount).

Bandwidth usage never was really high and nodes not more than 10. My ISP (cosmote/OTE) won't block that all all. When connecting to other swarms like bit-torrent or IPFS I have hundred of nodes immediately.

Are there known firewall issues? Did not find anything about. Even forwarded 30303 from my "Mikrotik" and see incoming connections in the log with some traffic.

@tyramisoux
Copy link

tyramisoux commented Mar 23, 2018

First time (trying for months now) I have 16 peers connected. Nothing changed!
Are there known DDOS to the system?

Progress is slow anyway. Disk-access, I/O, Network traffic quite harmless and almost no CPU load taken.

@yangwao
Copy link

yangwao commented Apr 17, 2018

I came along and have numerous warning like this

WARN [04-17|07:56:14] Synchronisation failed, dropping peer    peer=d6d626bdab767a4a err="retrieved hash chain is invalid"

Even after few days it still continues giving me this warning.
I'm running most recent stable version.
Tried both --fast and --syncmode="full" options.

admin.nodeInfo
...
  name: "Geth/v1.8.3-stable/linux-amd64/go1.10",
...

@ankita-p17
Copy link

I had the same problem when connecting ropsten testnet.
I've resolved this problem by changing boot nodes.

I am having the same problem while connecting to ropsten testnet. How do you changed the boot nodes? and what is that you used?

@jeff-nasseri
Copy link

jeff-nasseri commented May 14, 2021

im run geth on ropsten network .after that i create account with this address "0x57d46fd9648e9d182100a3bdec61edf4a662cbf3"
and then deposit some ethereum in my account using metamask
when i check my address at ropsten explorer ,my deposit amount show as well
but when i using geth attach http://localhost:8545 and using eth.getBalance("0x57d46fd9648e9d182100a3bdec61edf4a662cbf3") always return 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests