Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to unregister sync peer (which not registered) #22077

Closed
Tracked by #22139
holiman opened this issue Dec 26, 2020 · 11 comments · Fixed by #22235
Closed
Tracked by #22139

Failing to unregister sync peer (which not registered) #22077

holiman opened this issue Dec 26, 2020 · 11 comments · Fixed by #22235
Assignees

Comments

@holiman
Copy link
Contributor

holiman commented Dec 26, 2020

version instance=Geth/v1.9.26-unstable-b9012a03-20201223/linux-amd64/go1.15.3

It's running latest develop, --syncmode=snap --snapshot. I just restarted it after having done a pruning. It keeps saying complaining about failing to unregister a peer that it for some reason thinks it's connected to, while it apparently is not.

INFO [12-26|18:50:25.988] Loaded most recent local fast block      number=11463108 hash="89843c…4d8041" td=19437194618159153682305 age=1w3d9h
INFO [12-26|18:50:25.989] Loaded last fast-sync pivot marker       number=11418122
INFO [12-26|18:50:25.996] Loaded local transaction journal         transactions=0 dropped=0
INFO [12-26|18:50:25.996] Regenerated local transaction journal    transactions=0 accounts=0
WARN [12-26|18:50:25.996] Switch sync mode from fast sync to full sync 
INFO [12-26|18:50:25.997] Starting peer-to-peer node               instance=Geth/v1.9.26-unstable-b9012a03-20201223/linux-amd64/go1.15.3
INFO [12-26|18:50:26.028] New local node record                    seq=8 id=61673c8231752e60 ip=127.0.0.1 udp=30303 tcp=30303
INFO [12-26|18:50:26.028] Started P2P networking                   self=enode://cc44695159f742595bf2fd9a9da13535733836c71cd5ca1179d76c11b95565f37c8014db177e90c9d7907b3a17439b3f51949c2144adbe4fef0e615d11900ca8@127.0.0.1:30303
INFO [12-26|18:50:26.029] IPC endpoint opened                      url=/home/martin/.ethereum/geth.ipc
ERROR[12-26|18:50:26.961] Failed to unregister sync peer           peer=8d1da4c5bc682f8f err="peer is not registered"
ERROR[12-26|18:50:26.961] Peer removal failed                      peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered"
INFO [12-26|18:50:28.747] Mapped network port                      proto=tcp extport=30303 intport=30303 interface="UPNP IGDv2-IP1"
INFO [12-26|18:50:28.969] New local node record                    seq=9 id=61673c8231752e60 ip=<nil>     udp=30303 tcp=30303
INFO [12-26|18:50:29.158] Mapped network port                      proto=udp extport=30303 intport=30303 interface="UPNP IGDv2-IP1"
INFO [12-26|18:50:36.190] Looking for peers                        peercount=1 tried=26 static=0
INFO [12-26|18:50:46.191] Looking for peers                        peercount=0 tried=44 static=0
INFO [12-26|18:50:56.256] Looking for peers                        peercount=0 tried=48 static=0
INFO [12-26|18:51:03.609] Block synchronisation started 
WARN [12-26|18:51:03.609] Synchronisation failed, retrying         err="peer is unknown or unhealthy"
ERROR[12-26|18:51:03.609] Failed to unregister sync peer           peer=8d1da4c5bc682f8f err="peer is not registered"
ERROR[12-26|18:51:03.609] Peer removal failed                      peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered"
INFO [12-26|18:51:06.258] Looking for peers                        peercount=0 tried=46 static=0
INFO [12-26|18:51:16.283] Looking for peers                        peercount=1 tried=51 static=0
INFO [12-26|18:51:26.297] Looking for peers                        peercount=2 tried=27 static=0
INFO [12-26|18:51:36.302] Looking for peers                        peercount=2 tried=73 static=0
ERROR[12-26|18:51:41.178] Failed to unregister sync peer           peer=8d1da4c5bc682f8f err="peer is not registered"
ERROR[12-26|18:51:41.178] Peer removal failed                      peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered"
@holiman
Copy link
Contributor Author

holiman commented Dec 28, 2020

This is funky! I restarted a totally different machine, bench03, and it's complaining about the exact same peer 8d1da4c5bc682f8f:

 Dec 28 14:23:33 bench03.ethdevops.io geth INFO [12-28|13:23:33.834] HTTP server started endpoint=[::]:8545 cors=* vhosts=*
Dec 28 14:23:33 bench03.ethdevops.io geth INFO [12-28|13:23:33.834] Stats daemon started
Dec 28 14:23:40 bench03.ethdevops.io geth INFO [12-28|13:23:40.121] Block synchronisation started
Dec 28 14:23:43 bench03.ethdevops.io geth INFO [12-28|13:23:43.868] Looking for peers peercount=10 tried=49 static=0
Dec 28 14:23:53 bench03.ethdevops.io geth INFO [12-28|13:23:53.880] Looking for peers peercount=13 tried=51 static=0
Dec 28 14:24:04 bench03.ethdevops.io geth INFO [12-28|13:24:03.971] Looking for peers peercount=19 tried=48 static=0
Dec 28 14:24:10 bench03.ethdevops.io geth ERROR[12-28|13:24:10.492] Snap peer not registered id=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d
Dec 28 14:24:10 bench03.ethdevops.io geth ERROR[12-28|13:24:10.493] Peer removal failed peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered"
Dec 28 14:24:14 bench03.ethdevops.io geth INFO [12-28|13:24:14.144] Looking for peers peercount=23 tried=32 static=0
Dec 28 14:24:18 bench03.ethdevops.io geth WARN [12-28|13:24:18.199] Header broke chain ordering number=11542707 hash="2573a0…4e06e5" expected=11542699
Dec 28 14:24:18 bench03.ethdevops.io geth INFO [12-28|13:24:18.199] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=0.00B throttle=8192
Dec 28 14:24:18 bench03.ethdevops.io geth WARN [12-28|13:24:18.200] Synchronisation failed, dropping peer peer=a00ddd1b3c27c131f855373f2154939ee27326ca13ec062d696e3caebd49db28 err="action from bad peer ignored: stale headers"
Dec 28 14:24:24 bench03.ethdevops.io geth INFO [12-28|13:24:24.290] Looking for peers peercount=27 tried=54 static=0
Dec 28 14:24:34 bench03.ethdevops.io geth WARN [12-28|13:24:34.290] Checkpoint challenge timed out, dropping id=45dabcff84b0bdb2 conn=inbound addr=47.242.166.155:53518 type=Geth/v1.9.24-stable-...
Dec 28 14:24:34 bench03.ethdevops.io geth INFO [12-28|13:24:34.290] Looking for peers peercount=30 tried=52 static=0
Dec 28 14:24:44 bench03.ethdevops.io geth INFO [12-28|13:24:44.476] Looking for peers peercount=35 tried=49 static=0
Dec 28 14:24:54 bench03.ethdevops.io geth INFO [12-28|13:24:54.652] Looking for peers peercount=36 tried=43 static=0
Dec 28 14:25:04 bench03.ethdevops.io geth INFO [12-28|13:25:04.654] Looking for peers peercount=36 tried=53 static=0
Dec 28 14:25:23 bench03.ethdevops.io geth ERROR[12-28|13:25:23.023] Failed to unregister sync peer peer=8d1da4c5bc682f8f err="peer is not registered"
Dec 28 14:25:23 bench03.ethdevops.io geth ERROR[12-28|13:25:23.023] Peer removal failed peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered"
Dec 28 14:26:38 bench03.ethdevops.io geth ERROR[12-28|13:26:37.944] Failed to unregister sync peer peer=8d1da4c5bc682f8f err="peer is not registered"
Dec 28 14:26:38 bench03.ethdevops.io geth ERROR[12-28|13:26:37.944] Peer removal failed peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err="peer not registered" 

@holiman
Copy link
Contributor Author

holiman commented Dec 28, 2020

A third machine is flipping between "is it registered or not?"

 Dec 28 15:03:28 bench04.ethdevops.io geth INFO [12-28|14:03:28.007] Imported new chain segment blocks=1 txs=206 mgas=12.253 elapsed=107.732ms mgasps=113.732 number=11542877 hash="2199ff…9f43b5" dirty=36.82MiB
Dec 28 15:03:35 bench04.ethdevops.io geth INFO [12-28|14:03:34.964] Imported new chain segment blocks=1 txs=212 mgas=12.480 elapsed=213.800ms mgasps=58.373 number=11542878 hash="d7ed89…ee3c72" dirty=38.48MiB
Dec 28 15:03:38 bench04.ethdevops.io geth ERROR[12-28|14:03:38.567] Failed to unregister sync peer peer=8d1da4c5bc682f8f err="peer is not registered"
Dec 28 15:03:49 bench04.ethdevops.io geth INFO [12-28|14:03:49.940] Imported new chain segment blocks=1 txs=242 mgas=12.505 elapsed=121.610ms mgasps=102.832 number=11542879 hash="c0aa0d…edbd97" dirty=40.27MiB
Dec 28 15:04:19 bench04.ethdevops.io geth INFO [12-28|14:04:19.403] Deep froze chain segment blocks=4 elapsed=5.049ms number=11452879 hash="804cce…533236"
Dec 28 15:04:20 bench04.ethdevops.io geth ERROR[12-28|14:04:20.047] Failed to register sync peer peer=8d1da4c5bc682f8f err="peer is already registered"
Dec 28 15:04:35 bench04.ethdevops.io geth INFO [12-28|14:04:35.100] Imported new chain segment blocks=1 txs=221 mgas=12.491 elapsed=121.899ms mgasps=102.468 nu

@ligi ligi mentioned this issue Jan 7, 2021
5 tasks
@holiman
Copy link
Contributor Author

holiman commented Jan 7, 2021

The eth/downloader/peer.go has a peerSet like this:

type peerSet struct {
	peers        map[string]*peerConnection
	newPeerFeed  event.Feed
	peerDropFeed event.Feed
	lock         sync.RWMutex
}

Whereas teh eth/peerset.go has it a bit differently, where the two are
split up into two different types of peers:

type peerSet struct {
	ethPeers  map[string]*ethPeer  // Peers connected on the `eth` protocol
	snapPeers map[string]*snapPeer // Peers connected on the `snap` protocol

The method in the eth/handler to remove a peer does this,

  1. Check if it's an eth peer, if so, tells the downloader to remove it.
  2. Checks if it's a snap peers, if so, tells the downloader.SnapSyncer to remove it
func (h *handler) removePeer(id string) {
	// Remove the eth peer if it exists
	eth := h.peers.ethPeer(id)
	if eth != nil {
		log.Debug("Removing Ethereum peer", "peer", id)
		h.downloader.UnregisterPeer(id)
		h.txFetcher.Drop(id)

		if err := h.peers.unregisterEthPeer(id); err != nil {
			log.Error("Peer removal failed", "peer", id, "err", err)
		}
	}
	// Remove the snap peer if it exists
	snap := h.peers.snapPeer(id)
	if snap != nil {
		log.Debug("Removing Snapshot peer", "peer", id)
		h.downloader.SnapSyncer.Unregister(id)
		if err := h.peers.unregisterSnapPeer(id); err != nil {
			log.Error("Peer removal failed", "peer", id, "err", err)
		}
	}

Something isn't quite right with the accounting of peers here, and the special
snowflake 8d1da4c5bc682f8f is triggering something wonky

@holiman
Copy link
Contributor Author

holiman commented Jan 11, 2021

Interesting.. From #22151 :

WARN [01-10|19:19:35.325] Synchronisation failed, dropping peer    peer=8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d err=timeout

What's with that 8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d node ?

@ligi
Copy link
Member

ligi commented Jan 14, 2021

related #21257

@fjl
Copy link
Contributor

fjl commented Jan 14, 2021

This issue happens because removePeer in handler.go is racy:

If two goroutines call removePeer, they will both check the eth peer set and find the peer. Both goroutines will then call downloader.UnregisterPeer, and one of them will print this log message. The actual removal of the peer happens in a different goroutine.

We can fix this by refactoring the peer unregistration a bit. At the moment, we pass the removePeer method into the downloader as the dropPeer callback. The downloader should just call Disconnect on the peer instead of removePeer.
The only place that should unregister peers on downloader is the eth handler loop, when it exits.

@holiman
Copy link
Contributor Author

holiman commented Jan 20, 2021

@fjl want to handle this?

@holiman holiman linked a pull request Jan 26, 2021 that will close this issue
@michaelsproul
Copy link

michaelsproul commented Mar 4, 2021

I'm seeing Geth 1.10.0 throw a few similar errors (after upgrading from 1.9.25)

WARN [03-04|00:50:44.955] Checkpoint challenge timed out, dropping id=9993b5647ff6a1c0 conn=dyndial addr=X.X.X.X:30303 type
=Geth/v1.9.23-stable-...
ERROR[03-04|00:50:44.956] Ethereum peer removal failed             peer=9993b564 err="peer not registered"
WARN [03-04|00:51:29.956] Synchronisation failed, dropping peer    peer=9993b5647ff6a1c00a44118e9030e2339812f58505e906a0d7c3dd7091308158 err=timeout
ERROR[03-04|00:51:29.956] Ethereum peer removal failed             peer=9993b564 err="peer not registered"

^For the same peer, a few seconds apart.

For a different peer:

WARN [03-04|00:51:06.927] Checkpoint challenge timed out, dropping id=d597cf322d40ad8c conn=dyndial addr=X.X.X.X:30303 typ
e=Geth/v1.9.25-stable-...
ERROR[03-04|00:51:06.927] Ethereum peer removal failed             peer=d597cf32 err="peer not registered"

The only slightly naughty thing I did was run the new geth db inspect command before starting properly with 1.10.0, which tried to start constructing a snapshot, so I Ctrl-C'd it (although I suspect that's not related to this issue).

@superbeanmr
Copy link

微信图片_20210603103352

I was synchronizing the ropsten node, and when there were dozens of blocks, I checked the synchronization log and found that it was easy to insert the error in the figure below every few logs. Can someone tell me how to deal with the change?

@superbeanmr
Copy link

WARN [06-03|02:38:16.714] Ancestor below allowance peer=465c4021 number=10,263,739 hash=000000..000000 allowance=10,263,739
WARN [06-03|02:38:16.714] Synchronisation failed, dropping peer peer=465c4021eb888a6ef93709761a3a502f6d200cc94e3e810467b5a96f5bb4fa8a err="retrieved ancestor is invalid"
ERROR[06-03|02:38:16.715] Ethereum peer removal failed peer=465c4021 err="peer not registered"

@superbeanmr
Copy link

The above error occurs when Generating state snapshot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants