Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Retry check workflow query to be more resilient to backend failures #1740

Merged

Conversation

swcollard
Copy link
Contributor

Overview

The check_workflow commands currently have an assumption that the request has a success value. If there is a transient issue or blip on the backend when its unavailable the entire check polling failings which appears to the end user as a failure to run a check workflow, despite the check being submitted successfully. The other adverse effect is the gateway is exposing backend/subgraph error messages to the client when it probably shouldn't be.

This change augments the loop a bit and if there is an error, it sleeps and retries until hitting the configured checks_timeout_seconds (default 5 min) at which point it returned an E031 timeout. This should allow us to be a little more resilient to backend failures since this query is retryable and can return the completed check if queried again.

How this was tested

For testing, I added an artificial sleep after submitting the check request but before polling for the results started. This gave me time to disable my internet connection, simulating request failure, and the loop would continue to poll every 5 seconds until reaching the checks_timeout_seconds value (defaulted to 5 minutes). For a second test, I disabled, then re-enabled my internet connection during the polling and it was able to fetch the results response once the connection was re-established. The UX of this was the request appeared to be taking longer.

Co-authored-by: Avery Harnish <avery@apollographql.com>
@EverlastingBugstopper EverlastingBugstopper merged commit 5e06d54 into main Sep 19, 2023
@EverlastingBugstopper EverlastingBugstopper deleted the swcollard/check_workflow_loop_retry_on_failure branch September 19, 2023 15:04
@EverlastingBugstopper EverlastingBugstopper added the feature 🎉 new commands, flags, functionality, and improved error messages label Sep 19, 2023
@EverlastingBugstopper EverlastingBugstopper added this to the v0.19.0 milestone Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🎉 new commands, flags, functionality, and improved error messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants