-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: curio: alertManager #11926
feat: curio: alertManager #11926
Conversation
d688f3e
to
908ad04
Compare
77fec43
to
1b1448a
Compare
e16a789
to
a4e5b7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Just one nit.
Not in this PR, but would be neat to reuse the config parsing logic in balanceCheck
and gather all miner-ids served by the cluster, then check that winningPoSt is happenning correctly based on entries being added to the mining_tasks
table (we expect ~2880 entries in last 24h per minerID based on the base_compute_time column, the check should probably be for a smaller window tho)
I think a better check would be to just check that all "win" instances resulted in a block which was accepted by the chain. This would help see if SPs are loosing blocks due to any issues. I have extracted the address function for future use. |
3639986
to
67922cf
Compare
That's another good one, but the former does tell you that you have some machines in the cluster taking care of winningPoSt at all times |
Related Issues
Proposed Changes
This PRs adds a new singleton task for alerting.
The alerting task can create an incident for a previously configured service.
User should get critical alerts via SMS, email and call.
Additional Info
A new page about alerting is required in Curio docs. We also need to create a section on how to sign up and configure PagerDuty.
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps