Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node stuck on a block #1639

Closed
quentinlesceller opened this issue Oct 2, 2018 · 12 comments
Closed

Node stuck on a block #1639

quentinlesceller opened this issue Oct 2, 2018 · 12 comments
Labels

Comments

@quentinlesceller
Copy link
Member

quentinlesceller commented Oct 2, 2018

Non mining node stuck on block 120,766. Running 4d2cbe6.
Attaching the grin-server.log: grin-crash.log

Oct 02 13:48:45.574 DEBG Block header 15d28cd1 refused by chain: Unfit("header already known")
Oct 02 13:48:45.574 DEBG pipe: process_block 15d28cd1 at 120766 with 0 inputs, 1 outputs, 1 kernels
Oct 02 13:48:45.574 DEBG Send header 15d28cd1 to 109.74.202.16:13414
Oct 02 13:48:45.574 DEBG Send header 15d28cd1 to 94.130.229.193:13414
Oct 02 13:48:45.729 DEBG Received block header 15d28cd1 at 120766 from 54.241.140.103:13414, going to process.
Oct 02 13:48:46.178 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (14 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:48:46.178 DEBG clean_peers V4(104.209.194.130:13414), not connected
Oct 02 13:48:54.638 DEBG Saving newly connected peer 104.209.194.130:13414.
Oct 02 13:48:54.639 DEBG save_peer: V4(104.209.194.130:13414) marked Healthy
Oct 02 13:48:55.144 DEBG Received block header 0b730629 at 120767 from 139.162.168.18:13414, going to process.
Oct 02 13:48:55.324 DEBG Received block header 0b730629 at 120767 from 223.17.252.92:13414, going to process.
Oct 02 13:48:55.595 DEBG Received block header 0b730629 at 120767 from 82.23.106.56:13414, going to process.
Oct 02 13:49:00.021 DEBG Received block header 0b730629 at 120767 from 45.61.156.122:13414, going to process.
Oct 02 13:49:06.180 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:49:26.181 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:49:46.183 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:50:06.184 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:50:14.356 DEBG Received block header 036f42b1 at 120764 from 204.48.26.36:13414, going to process.
Oct 02 13:50:17.239 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 13:50:17.239 DEBG Received compact_block 036f42b1 at 120764 from 167.99.87.54:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 13:50:17.597 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 13:50:17.597 DEBG Received compact_block 036f42b1 at 120764 from 167.99.53.226:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 13:50:17.996 DEBG Received block header 036f42b1 at 120764 from 94.130.229.193:13414, going to process.
Oct 02 13:50:26.186 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:50:46.188 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:51:06.189 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:51:10.719 DEBG Received block header 11116234 at 120765 from 195.201.38.140:13414, going to process.
Oct 02 13:51:26.191 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct
Oct 02 13:51:46.192 DEBG monitor_peers: on 0.0.0.0:13414, 19 connected (5 most_work). all 136 = 64 healthy + 0 banned + 72 defunct

After that only the monitor peers thread appears to be running.

@antiochp
Copy link
Member

antiochp commented Oct 2, 2018

I wonder if there is a deadlock involved there like @garyyu identified yesterday.
Possibly the chain.head() is locked up and no threads can process blocks or headers.

@garyyu
Copy link
Contributor

garyyu commented Oct 3, 2018

I highly suspect this is an old build. @quentinlesceller Could you share me the whole log file? just paste a link here or send from Gitter 😄

@quentinlesceller
Copy link
Member Author

Okay it just happened again and here is the full log. it's a 154 mb file.
http://lesceller.com/misc/grin-server.log

@garyyu
Copy link
Contributor

garyyu commented Oct 4, 2018

Indeed this could be another dead lock case remaining.

Oct 02 14:30:19.299 INFO This is Grin version 0.3.0 (git 0.0.0-pre.0.3.0-23-g85d5fea), built for x86_64-unknown-linux-gnu by rustc 1.28.0 (9634041f0 2018-07-30).
...
Oct 02 14:43:10.398 DEBG Received block header 0354e2dc at 120821 from 204.48.26.36:13414, going to process.
Oct 02 14:43:10.423 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 14:43:10.423 DEBG Received compact_block 0354e2dc at 120821 from 94.130.229.193:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 14:43:10.449 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 14:43:10.449 DEBG Received compact_block 0354e2dc at 120821 from 173.230.154.243:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 14:43:10.459 DEBG Received block header 0354e2dc at 120821 from 167.99.87.54:13414, going to process.
Oct 02 14:43:29.413 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 14:43:29.413 DEBG Received compact_block 0a0b9a15 at 120819 from 45.61.156.122:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 14:44:02.457 DEBG Received block header 01db1203 at 120820 from 95.216.163.175:13414, going to process.
Oct 02 14:44:03.063 DEBG Received block header 01db1203 at 120820 from 139.162.168.18:13414, going to process.
Oct 02 14:44:55.986 DEBG Received block header 133ebd13 at 120823 from 185.80.129.70:13414, going to process.
Oct 02 14:44:59.575 DEBG Received block header 0ffc28e9 at 120821 from 195.201.38.140:13414, going to process.
Oct 02 14:45:02.788 DEBG handle_payload: received compact block: msg_len: 1268
Oct 02 14:45:02.789 DEBG Received compact_block 0ffc28e9 at 120821 from 167.99.53.226:13414, outputs: 1, kernels: 1, kern_ids: 0, going to process.
Oct 02 14:46:00.609 DEBG Received block header 09538452 at 120825 from 64.119.200.147:13414, going to process.
Oct 02 14:46:39.629 DEBG Received block header 09538452 at 120825 from 175.137.182.97:13414, going to process.

< and then no more receiving handling, I guess it's also blocked>

@quentinlesceller
Copy link
Member Author

Let me know if you want more logs. It is happening to me fairly regularly (every 16 hours or so) on a non mining node.

@garyyu
Copy link
Contributor

garyyu commented Oct 4, 2018

This case is quite difficult, more logs will not help anymore.

That fixed dead lock case is relatively easy because it can 100% reproduce, but this one not easy to reproduce, I observe on 2 nodes for whole day and no happening.

I'm working on a tool now, but still need some time, and once it's ready, I need your help to test on your this node.

@quentinlesceller
Copy link
Member Author

Sure.

@tromp
Copy link
Contributor

tromp commented Oct 5, 2018

Please check if this is fixed by #1657

I was stuck at height 95045 before and just got synced with latest build...

@garyyu
Copy link
Contributor

garyyu commented Oct 5, 2018

Same ask as @tromp, I feel @antiochp 's major refactor would kill a bunch of potential sync bugs 😄

If indeed this dead lock issue also fixed, then I'm free to other tasks.

@quentinlesceller
Copy link
Member Author

Perfect. I'm updating my nodes and I'll let you know! Thank you @antiochp and @garyyu!

@quentinlesceller
Copy link
Member Author

quentinlesceller commented Oct 5, 2018

Unfortunately still stuck 60 blocks behind on archival node.
grin-server.log

@ignopeverell
Copy link
Contributor

Hopefully fixed, or at least a similar issue will be different enough to warrant its own issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants