-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix muxer that can be stucked into an infinite loop #2681
Conversation
Could you explain the problem and how this fixes it? |
When creating the daemon The root cause of the problem lies in the order of cancelation of those tasks during tokio rt shutdown. If the My fix consists into catching this error and returning |
So this won't loop at all now and will exit? Or will it continue running, just without the error logs? |
This gives the hand to the tokio runtime. And since the task has be cancelled it will then exit. |
I still have a hard time understand both the problem and the solution. Could you explain it in a simpler way and then add that explanation to the code? |
This seems rather fragile to me. What if something fails with a different message? Can't we fix the underlying issue? Why can any error throw us into an infinite loop? |
I've added a comment and testing as well for the specific error string. I'm not sure if it's the best but I'm open to some better fix. |
It looks to me that the root of the problem is that we cannot presently tell the difference between incorrect input (such as malformed tipsets) and fatal errors (such as a DB error or a closed channel). Right now, we treat all errors as recoverable and enter a tight, infinite loop if the errors are persistent. I propose we bail on errors by default and then keep a white-list of recoverable errors (such as receiving bad data from peers). This would be a band-aid solution until we can re-write the muxer code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if it causes any problems. It might bail too eagerly on transient errors.
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes #2672
82ca23c does run
Muxer
future separately to reproduce the issue.Other information and links
Change checklist
(if possible),
CHANGELOG is
up-to-date. All user-facing changes should be reflected in this document.