-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Added retrying on indexer failures #636
Conversation
chain-signatures/node/src/cli.rs
Outdated
let sign_queue = sign_queue.clone(); | ||
let gcp = gcp.clone(); | ||
|
||
if let Err(err) = indexer::run(options, mpc_contract_id, account_id, sign_queue, gcp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to exit the loop if the ::run() is successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, there's no cancellation when it comes to indexer, since it's ran on a separate thread. So that indexer::run
will never return anything unless it errors out. The reason why none of our stuff stalls is because the integration tests will directly kill the nodes. I'll add in a break
to make it explicit just in case we do add interrupt handlers in the future, but this is not an issue at all for us
chain-signatures/node/src/cli.rs
Outdated
break; | ||
}; | ||
tracing::error!(%err, "indexer failed"); | ||
std::thread::sleep(std::time::Duration::from_secs(2u64.pow(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sleeping 2**9=512
seconds too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that could be too long. Let's just cap it to something like 5 mins instead of 10mins. And instead of 10 loops, we can just loop infinitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Terraform Feature Environment Destroy (dev-636)Terraform Initialization ⚙️
|
* Added retrying on indexer failures * Added notes about indexer erroring out * Max out indexer retry loop delay and make it loop infinitely
* Added retrying on indexer failures * Added notes about indexer erroring out * Max out indexer retry loop delay and make it loop infinitely
Noticed that indexer can potentially fail but won't report the error until we go to join the handle much later. This is due to the fact that panics are per thread here and would require getting the panic hook to truly handle it. Which is quite a hassle to deal with for now.
There's one minor thing this doesn't address is the cancellation of threads, which means that the when we go to join the thread handle at the end of the run call, it will potentially not join due to indexer being alive still. We would need to add some cancellation mechanisms such as sending over a message to kill the indexer loop, but that's too much work right now for this simple fix.