Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] miner can't receive head change any more after "window post scheduler notifs channel closed". #5813

Closed
ricktian1226 opened this issue Mar 15, 2021 · 4 comments

Comments

@ricktian1226
Copy link

ricktian1226 commented Mar 15, 2021

Describe the bug
while "window post scheduler notifs channel closed" show up, miner can't receive head change anymore.
seen other connection errors at the meantime.
restarting miner can resolve it.

2021-03-15T10:59:35.287+0800 ^[[33mWARN^[[0m advmgr sector-storage/sched_worker.go:261 failed to check worker session {"worker": "0d688d6e-7235-4d32-89c9-2a74a6739e86", "error": "RPC client error: sendRequest failed: Post \"http://192.168.0.231:7890/rpc/v0\": context deadline exceeded"} 2021-03-15T10:59:34.615+0800 ^[[33mWARN^[[0m advmgr sector-storage/sched_worker.go:261 failed to check worker session {"worker": "e7a3028e-8bd2-4f29-813c-84ece2787c41", "error": "RPC client error: sendRequest failed: Post \"http://192.168.0.245:7890/rpc/v0\": context deadline exceeded"} 2021-03-15T10:59:35.471+0800 ^[[33mWARN^[[0m storageminer storage/wdpost_sched.go:120 window post scheduler notifs channel closed

Version (run lotus version):
v1.5.0

To Reproduce
Steps to reproduce the behavior:
don't catch the reproduce way.

Expected behavior
while "notifs channel closed" ,it should "reopen" immediately, rather than MUST restart miner.

Logs
N/A

@ricktian1226 ricktian1226 changed the title [BUG] [BUG] miner can't receive head change any more after "window post scheduler notifs channel closed". Mar 16, 2021
@Aloxaf
Copy link
Contributor

Aloxaf commented Nov 12, 2021

I just got the same error. The miner didn't do WindowPoSt after this error, and restarting miner can solve it.

2021-11-12T02:16:07.752+0800    WARN    storageminer    storage/wdpost_sched.go:123     window post scheduler notifs channel closed
2021-11-12T02:16:07.752+0800    ERROR   events  events/events_called.go:478     messagesForTs MessagesForBlock failed (ts.H=1278270, Bcid:bafy2bzaceaddi5fuwk5zcrgig45jo376rsufdoptp2rtja5ajple7cpou6tfq, B.Mcid:bafy2bzacebgcfiw7lse3mt25eeq4htptl7r47istmcz3n4b6v6yic6sbserle): handler: websocket connection closed

@potato1992
Copy link

The same, restart the miner solve it. The miner was not doing any sealing work.

@rjan90
Copy link
Contributor

rjan90 commented Apr 21, 2022

Hey! Just wanted to updated everybody in this thread that this issue will be tracked in #8362 going forward. It´s an issue that is on our radar, and one that we really want to find find a fix for, but it´s also one that we would certainly need additional help from everybody.

  1. Ideally we are looking for a very detailed, and an easy way to get a good repro (reliable/easy/fast) on this issue - so if anybody here has some insights to how we can easily repro it, post the steps in wdPoSt scheduler notifs channel closes when lotus-miner is under heavy load #8362.
  2. Goroutine dumps from both sides (lotus daemon & lotus-miner) could be helpful, but ideally we are looking for 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@ricktian1226 @rjan90 @Aloxaf @potato1992 @Reiers and others