-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTCD miss a block and subsequent blocks are all orphan #229
Comments
I've seen this issue a few times as well. FWIW, I'm also on a dynamic IP, behind NAT, and using --externalip. My node is running on a Windows 8.1 desktop. The proper behavior would be for the node which sent the orphan to also send the parent. If btcd didn't receive the parent for some reason, it should getdata the parent blocks until the chain attaches, but this code doesn't seem to be working as intended. When I have noticed it, the missing block can still be accepted over RPC using submitblock, so the block manager isn't totally locked up (just missing the needed blocks). However, one time when I was fumbling through btcctl and submitblock (I had to hardcode the block hex into btcctl because otherwise it would segfault, presumably due to the btcctl arguments being too long), it rejected the block because it was already downloaded and validated. So I don't think this is a permanent error. After submitting blocks manually so that orphans can also be validated, while still downloading from the same syncpeer, the orphan problems continue to occur. If I change my syncpeer, the problems seem to go away or become less frequent. This indicates that the other node seems to be doing something we don't expect, and if we can narrow that down it should be easier to reproduce and fix our bug. |
Thanks a lot, next time I'll try to use the submitblock, this time I restarted and now it is syncing well. |
Great. Could you also run with profiling enabled (--profile=portnum)? If there is a synchronization bug between goroutines, it should be noticeable in the stack dumps. |
Can you tell me the port num for profiling? |
It can be anything (that you have permissions for). It runs a profiling server that you can then access from http://localhost:####/ I typically use 6060 for my godoc instance, 6061 for btcd, and 6062 for wallet, but you can be creative if you want. :) |
Ahh.. ok! :) I'll keep you updated. |
Again, this time stuck at block 339748. |
Can you post the logs of when it started and the contents of the "full goroutine stack dump" from the profiling server? |
Dump: https://www.dropbox.com/s/e1d1o4prpbpadrc/dump.txt?dl=0 I set debug just after, I hope this could help. If you need more just ask. |
I hit this as well earlier today and have been poking at it. Something interesting that I noticed is that the reported height of my sync peer (the height when it connected) is after my current node's last processed block, so it must have not have correctly synced to the sync peer right after it connected. Can you check your node to see if you're in the same situation? Look at the output of |
Well, according to your logs, you got a new sync peer that connected with a height of 339752, processed one block and made it to 339748, and then started receiving orphans. Likely the same issue I'm seeing. |
Now this is my situation: btcctl getpeerinfo btcctl getblockcount |
Is it right to have two entries for the same peer ip address? One with syncnode=true and the other with syncnode=false. { I also have a couple of peers with height less than mine but they are not syncnodes: "startingheight": 283676, |
I tried to add and remove nodes but nothing happens, it seems to be stick on this node even if it is not able to sync correctly. |
That is a bitcoind compatibility "feature". You can't disconnect from nodes using |
Ah.. ok. :) |
I don't much care for that functionality either, but out goal is to be compatible with BC as much as possible to more easily accommodate existing infrastructures. That said, I suspect allowing it to be used to also disconnect a node that wasn't added via addnode would probably not break any callers since it's additional functionality and thus deviating here might make sense. |
As an additional data point, after seeing this issue again and forcibly killing the connection to my syncnode with |
We've narrowed this down to btcd filtering out getblocks messages that have already been sent, where no reply has yet been received. If the reply is never received, we will never re-request the blocks, and because we remain connected to the same sync node until they disconnect, this can cause us to never process the missing parent blocks. |
Great, I'm happy you've found the issue. |
Any new developments or ways we can help track this down in the wild? For whatever reason, my node seems prone to doing this lately. |
So I've had a request to test this next time it occurs (I added the awk bits to make it simple): |
This commit fixes a long standing bug which previously, would cause a syncing node to get "stuck", halting all forward progress with respect to keeping up with the main chain. There are many nodes on the network which claim to be a fully functioning full-node, but don't actually properly implement the entire p2p protocol. For this particular case, after receiving a "getdata" requesting blocks, these nodes fail to respond with a "block" message. This would cause btcd to stall block download indefinitely, waiting for a response which never arrives. Since btcd currently syncs new blocks via a single syncPeer, it would never attempt to seek to new, active peers to download the next block from. To fix this issue, we now disconnect peers that we deem are stalling new block download after a specified timeout. This fixes btcsuite#486 and btcsuite#229.
This commit fixes a long standing bug which previously, would cause a syncing node to get "stuck", halting all forward progress with respect to keeping up with the main chain. There are many nodes on the network which claim to be a fully functioning full-node, but don't actually properly implement the entire p2p protocol. For this particular case, after receiving a "getdata" requesting blocks, these nodes fail to respond with a "block" message. This would cause btcd to stall block download indefinitely, waiting for a response which never arrives. Since btcd currently syncs new blocks via a single syncPeer, it would never attempt to seek to new, active peers to download the next block from. To fix this issue, we now disconnect peers that we deem are stalling new block download after a specified timeout. This fixes btcsuite#486 and btcsuite#229.
This commit implements stall detection logic at the peer level to detect and disconnect peers that are either not following the protocol in regards to expected response messages or have otherwise stalled. This is accomplished by setting deadlines for each message type which expects a response and periodically checking them while properly taking into account processing time. There are an increasing number of nodes on the network which claim to be full nodes, but don't actually properly implement the entire p2p protocol even though they implement it enough to cause properly implemented nodes to make data requests to which they never respond. Since btcd currently only syncs new blocks via single sync peer and, prior to this commit only had very basic stall detection, this could lead to a situation where the block download became stalled indefinitely due to one of these misbehaving peers. This commit fixes that issue since the stalled peer will now be detected and disconnected which leads to a new sync peer being selected. This logic will also fit nicely with the future multi-peer sync model which is on the roadmap and for which infrastructure work is underway. Closes btcsuite#486.
This commit implements stall detection logic at the peer level to detect and disconnect peers that are either not following the protocol in regards to expected response messages or have otherwise stalled. This is accomplished by setting deadlines for each message type which expects a response and periodically checking them while properly taking into account processing time. There are an increasing number of nodes on the network which claim to be full nodes, but don't actually properly implement the entire p2p protocol even though they implement it enough to cause properly implemented nodes to make data requests to which they never respond. Since btcd currently only syncs new blocks via single sync peer and, prior to this commit only had very basic stall detection, this could lead to a situation where the block download became stalled indefinitely due to one of these misbehaving peers. This commit fixes that issue since the stalled peer will now be detected and disconnected which leads to a new sync peer being selected. This logic will also fit nicely with the future multi-peer sync model which is on the roadmap and for which infrastructure work is underway. Closes btcsuite#486.
This commit implements stall detection logic at the peer level to detect and disconnect peers that are either not following the protocol in regards to expected response messages or have otherwise stalled. This is accomplished by setting deadlines for each message type which expects a response and periodically checking them while properly taking into account processing time. There are an increasing number of nodes on the network which claim to be full nodes, but don't actually properly implement the entire p2p protocol even though they implement it enough to cause properly implemented nodes to make data requests to which they never respond. Since btcd currently only syncs new blocks via single sync peer and, prior to this commit only had very basic stall detection, this could lead to a situation where the block download became stalled indefinitely due to one of these misbehaving peers. This commit fixes that issue since the stalled peer will now be detected and disconnected which leads to a new sync peer being selected. This logic will also fit nicely with the future multi-peer sync model which is on the roadmap and for which infrastructure work is underway. Fixes btcsuite#486 and fixes btcsuite#229.
This commit implements stall detection logic at the peer level to detect and disconnect peers that are either not following the protocol in regards to expected response messages or have otherwise stalled. This is accomplished by setting deadlines for each message type which expects a response and periodically checking them while properly taking into account processing time. There are an increasing number of nodes on the network which claim to be full nodes, but don't actually properly implement the entire p2p protocol even though they implement it enough to cause properly implemented nodes to make data requests to which they never respond. Since btcd currently only syncs new blocks via single sync peer and, prior to this commit only had very basic stall detection, this could lead to a situation where the block download became stalled indefinitely due to one of these misbehaving peers. This commit fixes that issue since the stalled peer will now be detected and disconnected which leads to a new sync peer being selected. This logic will also fit nicely with the future multi-peer sync model which is on the roadmap and for which infrastructure work is underway. Fixes btcsuite#486 and fixes btcsuite#229.
New installation of btcd 0.9 on a linux debian. Today my node (depsi.zapto.org) has block count 339153 since last night. From the log it looks like it missed the 339154 block, and since the 339155 every block is added as orphan, even if the hash it logs correspond to the right current block (checked on blockchain.info).
Sometimes it seems to start requesting some old block but never the 339154. There's no other error.
The same thing happened yesterday, and after a restart everything went fine.
The node is at my home, and I have a dynamic ip... but other than this problem everything seems to work well.
Thanks.
04:54:26 2015-01-16 [INF] BMGR: Processed 1 block in the last 1m22.57s (375 transactions, height 339153, 2015-01-16 03:55:40 +0100 CET)
04:54:26 2015-01-16 [INF] BMGR: Lost peer 85.175.5.129:21291 (inbound)
04:55:18 2015-01-16 [INF] BMGR: New valid peer 148.251.238.178:60281 (inbound) (/getaddr.bitnodes.io:0.1/)
04:55:18 2015-01-16 [INF] BMGR: Lost peer 148.251.238.178:60281 (inbound)
04:55:39 2015-01-16 [INF] BMGR: New valid peer 112.124.22.80:53714 (inbound) (/bitcoin-seeder:0.01/)
04:55:41 2015-01-16 [INF] BMGR: Lost peer 112.124.22.80:53714 (inbound)
04:55:46 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:33830 (inbound) (/Snoopy:0.1/)
04:55:46 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:33830 (inbound)
04:55:48 2015-01-16 [INF] BMGR: New valid peer 27.204.8.63:49756 (inbound) (/Satoshi:0.9.3/)
04:56:30 2015-01-16 [INF] BMGR: New valid peer 88.198.131.156:39072 (inbound) (/bitcoin-seeder:0.01/)
04:56:31 2015-01-16 [INF] BMGR: Lost peer 88.198.131.156:39072 (inbound)
04:57:35 2015-01-16 [INF] BMGR: New valid peer 199.127.225.195:43501 (inbound) (/bitcoin-seeder:0.01/)
04:58:42 2015-01-16 [INF] BMGR: New valid peer 172.245.62.102:36818 (inbound) (/bitcoin-seeder:0.01/)
04:58:42 2015-01-16 [INF] BMGR: Lost peer 172.245.62.102:36818 (inbound)
05:00:18 2015-01-16 [INF] BMGR: New valid peer 148.251.238.178:49893 (inbound) (/getaddr.bitnodes.io:0.1/)
05:00:18 2015-01-16 [INF] BMGR: Lost peer 148.251.238.178:49893 (inbound)
05:01:03 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:37309 (inbound) (/Snoopy:0.1/)
05:01:03 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:37309 (inbound)
05:02:35 2015-01-16 [WRN] PEER: Peer 199.127.225.195:43501 (inbound) no answer for 5 minutes, disconnecting
05:02:35 2015-01-16 [INF] BMGR: Lost peer 199.127.225.195:43501 (inbound)
05:03:31 2015-01-16 [INF] BMGR: New valid peer 50.7.47.93:48617 (inbound) (/bitcoin-seeder:0.01/)
05:03:33 2015-01-16 [INF] BMGR: Lost peer 50.7.47.93:48617 (inbound)
05:05:39 2015-01-16 [INF] CHAN: Adding orphan block 000000000000000001e2aae6be846835bbebddbf01baadecebf6df532a37468b with parent 0000000000000000148c1ae2b62db079f8cc21501141e4bd582bebd1aa9b1c7f
05:06:30 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:47264 (inbound) (/Snoopy:0.1/)
05:06:30 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:47264 (inbound)
05:06:45 2015-01-16 [INF] BMGR: New valid peer 148.251.238.178:19837 (inbound) (/getaddr.bitnodes.io:0.1/)
05:06:45 2015-01-16 [INF] BMGR: Lost peer 148.251.238.178:19837 (inbound)
05:09:31 2015-01-16 [INF] BMGR: New valid peer 50.244.13.28:44584 (inbound) (/bitcoin-seeder:0.01/)
05:09:32 2015-01-16 [INF] BMGR: Lost peer 50.244.13.28:44584 (inbound)
05:10:34 2015-01-16 [INF] BMGR: New valid peer 178.62.20.190:46658 (inbound) (/bitcoin-seeder:0.01/)
05:10:41 2015-01-16 [INF] BMGR: Lost peer 178.62.20.190:46658 (inbound)
05:11:36 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:52825 (inbound) (/Snoopy:0.1/)
05:11:36 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:52825 (inbound)
05:13:29 2015-01-16 [INF] BMGR: New valid peer 106.187.90.222:52862 (inbound) (/bitcoin-seeder:0.01/)
05:13:30 2015-01-16 [INF] BMGR: Lost peer 106.187.90.222:52862 (inbound)
05:14:16 2015-01-16 [WRN] PEER: Peer 27.204.8.63:49756 (inbound) no answer for 5 minutes, disconnecting
05:14:16 2015-01-16 [INF] BMGR: Lost peer 27.204.8.63:49756 (inbound)
05:15:26 2015-01-16 [INF] BMGR: New valid peer 148.251.238.178:10378 (inbound) (/getaddr.bitnodes.io:0.1/)
05:15:27 2015-01-16 [INF] BMGR: Lost peer 148.251.238.178:10378 (inbound)
05:16:18 2015-01-16 [INF] BMGR: New valid peer 178.62.187.91:54127 (inbound) (/mining.bitcoinaffiliatenetwork.com amsterdam:0.10.0/)
05:16:58 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:35860 (inbound) (/Snoopy:0.1/)
05:16:58 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:35860 (inbound)
05:17:41 2015-01-16 [INF] CHAN: Adding orphan block 0000000000000000097b41645adfb23cd199e3ba0862867c13c65b44d9269bd5 with parent 000000000000000001e2aae6be846835bbebddbf01baadecebf6df532a37468b
05:18:18 2015-01-16 [INF] BMGR: New valid peer 112.124.22.80:60503 (inbound) (/bitcoin-seeder:0.01/)
05:18:19 2015-01-16 [INF] BMGR: Lost peer 112.124.22.80:60503 (inbound)
05:18:54 2015-01-16 [WRN] PEER: Peer 5.138.146.164:10987 (inbound) no answer for 5 minutes, disconnecting
05:18:54 2015-01-16 [INF] BMGR: Lost peer 5.138.146.164:10987 (inbound)
05:21:22 2015-01-16 [INF] CHAN: Adding orphan block 00000000000000000cd0c6701ebfa6f14fede7db8a8d037e458f44cd70c32ebf with parent 0000000000000000097b41645adfb23cd199e3ba0862867c13c65b44d9269bd5
05:21:39 2015-01-16 [INF] BMGR: New valid peer 148.251.238.178:41738 (inbound) (/getaddr.bitnodes.io:0.1/)
05:21:39 2015-01-16 [INF] BMGR: Lost peer 148.251.238.178:41738 (inbound)
05:21:59 2015-01-16 [INF] BMGR: New valid peer 82.130.102.226:39289 (inbound) (/Snoopy:0.1/)
05:22:01 2015-01-16 [INF] BMGR: Lost peer 82.130.102.226:39289 (inbound)
05:22:36 2015-01-16 [INF] BMGR: New valid peer 72.188.5.125:49571 (inbound) (/BitCoinJ:0.11.2/MultiBit:0.5.18/)
05:22:41 2015-01-16 [INF] BMGR: New valid peer 216.87.0.198:52453 (inbound) (/bitcoinj:0.12.2/Bitcoin Wallet:4.16/)
05:22:49 2015-01-16 [INF] CHAN: Adding orphan block 0000000000000000162155d22e03e2760e02c6748566adfda3671ed9b29a97ec with parent 00000000000000000cd0c6701ebfa6f14fede7db8a8d037e458f44cd70c32ebf
The text was updated successfully, but these errors were encountered: