Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose underlying nim-ethers errors to logs #985

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

AuHau
Copy link
Member

@AuHau AuHau commented Nov 4, 2024

Brings in a version of nim-ethers where subscription callbacks are returned Result, which will give an error when there is some underlying problem in the subscription mechanisms. This PR exposes these errors through logging.

@AuHau AuHau requested a review from emizzle November 4, 2024 07:31
@AuHau AuHau mentioned this pull request Nov 28, 2024
Copy link
Contributor

@emizzle emizzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. My main question is more around should we do more than just log these errors? Are they bad enough to raise a Defect (and crash the node)? Or should we add metrics and monitor how often they happen?

I'll approve now since the suggestions are not really blocked.

proc onBlock(_: Block) =
proc onBlock(blckResult: ?!Block) =
if eventError =? blckResult.errorOption:
error "There was an error in block subscription", msg=eventError.msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me thinks if the clock fails to update due to fetching a block error, then we should raise a Defect, because a correct clock is critical for proper node function. However, we don't know if it's a temporary error or not.

We could add an error count and trigger a Defect after a few errors, but that sounds potentially like an over-optimisation without data yet. Maybe for now we should add a metric and monitor it? Wdyt?

Comment on lines +281 to +283
without event =? eventResult, eventErr:
error "There was an error in Request subscription", msg = eventErr.msg
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for most of these event subscriptions in the Market abstraction -- they are critical to node functionality, so should we log and continue or are they severe enough that we cannot assume normal node operation afterwards (thus warranting a Defect)?

@AuHau AuHau force-pushed the feat/nim-ethers-subscription-result branch from 63fee49 to 8a0e3ac Compare November 29, 2024 14:23
@AuHau
Copy link
Member Author

AuHau commented Dec 3, 2024

@emizzle I am unsure about raising Defects. I would for now leave it simply on Error log level and we should be able to detect those if something is fishy. Depends if and what we will detect, then I would act accordingly.

@AuHau AuHau enabled auto-merge December 3, 2024 09:51
@AuHau AuHau added this pull request to the merge queue Dec 3, 2024
Merged via the queue into master with commit 0707446 Dec 3, 2024
17 checks passed
@AuHau AuHau deleted the feat/nim-ethers-subscription-result branch December 3, 2024 12:38
@emizzle
Copy link
Contributor

emizzle commented Dec 3, 2024

@emizzle I am unsure about raising Defects. I would for now leave it simply on Error log level and we should be able to detect those if something is fishy. Depends if and what we will detect, then I would act accordingly.

I have to disagree that monitoring logs is sufficient. Manually combing through logs is tedious at best, especially when all log levels are enabled. We'd need to do some of regular search on kibana, but that's also not really practical. At the very least, a metric would be a better way to keep an eye this.

@AuHau
Copy link
Member Author

AuHau commented Dec 4, 2024

@emizzle I am unsure about raising Defects. I would for now leave it simply on Error log level and we should be able to detect those if something is fishy. Depends if and what we will detect, then I would act accordingly.

I have to disagree that monitoring logs is sufficient. Manually combing through logs is tedious at best, especially when all log levels are enabled. We'd need to do some of regular search on kibana, but that's also not really practical. At the very least, a metric would be a better way to keep an eye this.

I have to disagree with your disagreeing 🤪 These errors will be logged as error level. IMHO we should keep a tight eye on any errors popping up in logs (and generally, we should do logs monitoring) . You can easily set up a dashboard in Kibana that will show error logs occurrence (even related to these issues based on logs messages). IMHO this leads to the same result as metrics, but I agree that to some degree metrics give you an easier way to monitor these things as for example logs messages can change in the future (but then somebody can change the metrics key name as well).

I am not solely opposing this, I just wanted to wrap this because this work was (again) going forever. Feel free to create an issue and we can tweak this later on.

The main goal of this work was to expose in one way or other errors from nim-ether subscriptions which it does. So when we will have some unexpected behavior we can check for errors surfacing from the subscription mechanism...

@emizzle
Copy link
Contributor

emizzle commented Dec 5, 2024

The main goal of this work was to expose in one way or other errors from nim-ether subscriptions which it does.

Yep, good point and it achieved the goal very well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants