Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: group_manager crashes with unhandled exception #2204

Closed
vpavlin opened this issue Nov 9, 2023 · 5 comments · Fixed by #2250
Closed

bug: group_manager crashes with unhandled exception #2204

vpavlin opened this issue Nov 9, 2023 · 5 comments · Fixed by #2250
Labels
bug Something isn't working

Comments

@vpavlin
Copy link
Member

vpavlin commented Nov 9, 2023

Problem

wakunode2 will crash in group_manager.nim if the RPC node fails to respond or timeouts

/app/apps/wakunode2/wakunode2.nim(144) wakunode2
/app/vendor/nim-chronos/chronos/asyncloop.nim(1539) runForever
/app/vendor/nim-chronos/chronos/asyncloop.nim(268) poll
/app/vendor/nim-chronos/chronos/asyncfutures2.nim(525) cb
Error: unhandled exception: Asynchronous task [getAndHandleEvents() at group_manager.nim:305] finished with an exception "ValueError"!
Message: {"code":-32603,"message":"request failed or timed out"}
Stack trace: /app/apps/wakunode2/wakunode2.nim(144) wakunode2
/app/vendor/nim-chronos/chronos/asyncloop.nim(1539) runForever
/app/vendor/nim-chronos/chronos/asyncloop.nim(268) poll
/app/vendor/nim-chronos/chronos/asyncfutures2.nim(323) futureContinue
/app/vendor/nim-chronos/chronos/asyncmacro2.nim(305) call
[[reraised from:
/app/apps/wakunode2/wakunode2.nim(144) wakunode2
/app/vendor/nim-chronos/chronos/asyncloop.nim(1539) runForever
/app/vendor/nim-chronos/chronos/asyncloop.nim(268) poll
/app/vendor/nim-chronos/chronos/asyncfutures2.nim(323) futureContinue
/app/vendor/nim-chronos/chronos/asyncmacro2.nim(305) eth_getLogs
]]
 [FutureDefect]

It should probably either fail gracefully, or rather retry a few times before failing

It seems the culprit is probably getJsonLogs (https://github.com/status-im/nim-web3/blob/428b931e7c4f1284b4272bc2c11fca2bd70991cd/web3.nim#L535) throwing without specifying Raises and thus us not handling the exception?

Impact

critical

To reproduce

Run nwaku-compose and observe the error if your Ethereum RPC node fails or timeouts

Expected behavior

Fail gracefully or retry without failing

Screenshots/logs

If applicable, add screenshots or logs to help explain your problem.

nwaku version/commit hash

State the version of nwaku where you've encountered the bug or, if built off a specific commit, the relevant commit hash. You can check the version by running ./wakunode2 --version.

docker.io/wakuorg/nwaku:v0.21.2-rc.0

Additional context

Add any other context about the problem here.

@vpavlin vpavlin added the bug Something isn't working label Nov 9, 2023
@vpavlin
Copy link
Member Author

vpavlin commented Nov 9, 2023

@alrevuelta @rymnc Any thoughts on this?

@rymnc
Copy link
Contributor

rymnc commented Nov 9, 2023

yeah our error handling/retrying policy is not great. agree for graceful shutdown in case the eth node stops responding

@alrevuelta
Copy link
Contributor

Unsure if related: #1958

retry without failing

I would retry yeap.

@chair28980 chair28980 moved this to Priority in Waku Nov 14, 2023
@chair28980
Copy link
Contributor

@rymnc are you able to pick up this issue?

@rymnc
Copy link
Contributor

rymnc commented Dec 6, 2023

Hi, this can be closed by #2250

@rymnc rymnc linked a pull request Dec 6, 2023 that will close this issue
1 task
@github-project-automation github-project-automation bot moved this from Priority to Done in Waku Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants