-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose underlying nim-ethers errors to logs #985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. My main question is more around should we do more than just log these errors? Are they bad enough to raise a Defect (and crash the node)? Or should we add metrics and monitor how often they happen?
I'll approve now since the suggestions are not really blocked.
proc onBlock(_: Block) = | ||
proc onBlock(blckResult: ?!Block) = | ||
if eventError =? blckResult.errorOption: | ||
error "There was an error in block subscription", msg=eventError.msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me thinks if the clock fails to update due to fetching a block error, then we should raise a Defect, because a correct clock is critical for proper node function. However, we don't know if it's a temporary error or not.
We could add an error count and trigger a Defect after a few errors, but that sounds potentially like an over-optimisation without data yet. Maybe for now we should add a metric and monitor it? Wdyt?
without event =? eventResult, eventErr: | ||
error "There was an error in Request subscription", msg = eventErr.msg | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for most of these event subscriptions in the Market abstraction -- they are critical to node functionality, so should we log and continue or are they severe enough that we cannot assume normal node operation afterwards (thus warranting a Defect)?
63fee49
to
8a0e3ac
Compare
@emizzle I am unsure about raising Defects. I would for now leave it simply on Error log level and we should be able to detect those if something is fishy. Depends if and what we will detect, then I would act accordingly. |
I have to disagree that monitoring logs is sufficient. Manually combing through logs is tedious at best, especially when all log levels are enabled. We'd need to do some of regular search on kibana, but that's also not really practical. At the very least, a metric would be a better way to keep an eye this. |
I have to disagree with your disagreeing 🤪 These errors will be logged as error level. IMHO we should keep a tight eye on any errors popping up in logs (and generally, we should do logs monitoring) . You can easily set up a dashboard in Kibana that will show error logs occurrence (even related to these issues based on logs messages). IMHO this leads to the same result as metrics, but I agree that to some degree metrics give you an easier way to monitor these things as for example logs messages can change in the future (but then somebody can change the metrics key name as well). I am not solely opposing this, I just wanted to wrap this because this work was (again) going forever. Feel free to create an issue and we can tweak this later on. The main goal of this work was to expose in one way or other errors from nim-ether subscriptions which it does. So when we will have some unexpected behavior we can check for errors surfacing from the subscription mechanism... |
Yep, good point and it achieved the goal very well! |
Brings in a version of nim-ethers where subscription callbacks are returned Result, which will give an error when there is some underlying problem in the subscription mechanisms. This PR exposes these errors through logging.