Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Added retrying on indexer failures #636

Merged
merged 3 commits into from
Jun 10, 2024

Conversation

ChaoticTempest
Copy link
Member

Noticed that indexer can potentially fail but won't report the error until we go to join the handle much later. This is due to the fact that panics are per thread here and would require getting the panic hook to truly handle it. Which is quite a hassle to deal with for now.

There's one minor thing this doesn't address is the cancellation of threads, which means that the when we go to join the thread handle at the end of the run call, it will potentially not join due to indexer being alive still. We would need to add some cancellation mechanisms such as sending over a message to kill the indexer loop, but that's too much work right now for this simple fix.

let sign_queue = sign_queue.clone();
let gcp = gcp.clone();

if let Err(err) = indexer::run(options, mpc_contract_id, account_id, sign_queue, gcp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to exit the loop if the ::run() is successful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, there's no cancellation when it comes to indexer, since it's ran on a separate thread. So that indexer::run will never return anything unless it errors out. The reason why none of our stuff stalls is because the integration tests will directly kill the nodes. I'll add in a break to make it explicit just in case we do add interrupt handlers in the future, but this is not an issue at all for us

break;
};
tracing::error!(%err, "indexer failed");
std::thread::sleep(std::time::Duration::from_secs(2u64.pow(i)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sleeping 2**9=512 seconds too long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that could be too long. Let's just cap it to something like 5 mins instead of 10mins. And instead of 10 loops, we can just loop infinitely

ailisp
ailisp previously approved these changes Jun 7, 2024
Copy link
Member

@ailisp ailisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ChaoticTempest ChaoticTempest merged commit c66d801 into develop Jun 10, 2024
3 checks passed
@ChaoticTempest ChaoticTempest deleted the phuong/fix/retryable-indexer branch June 10, 2024 22:10
Copy link

Terraform Feature Environment Destroy (dev-636)

Terraform Initialization ⚙️success

Terraform Destroy success

Show Destroy Plan


No changes. No objects need to be destroyed.

Either you have not created any objects yet or the existing objects were
already deleted outside of Terraform.

Destroy complete! Resources: 0 destroyed.

Pusher: @ChaoticTempest, Action: pull_request, Working Directory: ``, Workflow: Terraform Feature Env (Destroy)

DavidM-D pushed a commit that referenced this pull request Jul 18, 2024
* Added retrying on indexer failures

* Added notes about indexer erroring out

* Max out indexer retry loop delay and make it loop infinitely
ppca pushed a commit that referenced this pull request Jul 18, 2024
* Added retrying on indexer failures

* Added notes about indexer erroring out

* Max out indexer retry loop delay and make it loop infinitely
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants