-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway: retry logic for requests to GCS #3836
Conversation
be8e5a8
to
4f2fbff
Compare
605f47c
to
7389f98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of the implementation here looks good with a couple comments about clarity/intent. However, I'm concerned we're not doing anything explicit/programmatically to ensure that we don't have multiple concurrent retry-able requests being retried at the same time.
For example, the default polling interval for pollingTimer
is 10000
. Could there be more than one at a time, e.g. the second of the five requests to GCS to obtain a composed schema fails and starts retrying with 30 seconds to go, but another invocation of pollingTimer
is kicked off before the retries elapse and starts its own process?
Perhaps the answer here is that the definitive resolution/rejection of the entirety of a fetch pass is what sets the next interval into motion. In other words, setInterval
changes to setTimeout
and the new timer is created by the rejection/resolution of the totality of the multiple fetch
es (or rather, fetchApolloGcs
es), after all retries are fully resolved realized, which all currently happens within updateComposition
here:
await this.updateComposition(); |
Does that make sense?
packages/apollo-gateway/src/__tests__/integration/networkRequests.test.ts
Outdated
Show resolved
Hide resolved
ed6446b
to
b13fdd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely!
// Prevent the Node.js event loop from remaining active (and preventing, | ||
// e.g. process shutdown) by calling `unref` on the `Timeout`. For more | ||
// information, see https://nodejs.org/api/timers.html#timers_timeout_unref. | ||
this.pollingTimer?.unref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substantially less critical to have this now that it's not a (forever) interval, but probably worth keeping.
16fef9a
to
ea10d29
Compare
…#3836) Implement gateway retry logic for requests to GCS. Failed requests will retry up to 5 times. Additionally, this PR adjusts how polling is done in order to prevent the possibility of multiple in-flight updates. The next "tick" only begins after a full round of updating is completed rather than on a perfectly regular interval. Thanks to @abernix for suggesting this change. For more details please see the PR description. Apollo-Orig-Commit-AS: apollographql/apollo-server@cdee9d6
This PR utilizes
make-fetch-happen
's built-in retry capabilities in order to retry failed requests to GCS.It's worth noting that retries only occur on certain types of failures. For additional details, please see the docs.
Additionally, now that we've added retries, this PR adjusts how polling is done in order to prevent the possibility of multiple in-flight updates. The next "tick" only begins after a full round of updating is completed rather than on a perfectly regular interval. Thanks to @abernix for suggesting this change.
To elaborate a bit: previously the gateway would fire off a series of fetches to GCS every 10s (unless specified otherwise). It was (pretty safely) assumed that this wouldn't be problematic - though technically a race condition exists if the fetches were to take a number of seconds each. With the introduction of retries, this becomes considerably more likely due to exponential backoff. To prevent this condition, the gateway starts the next 10s "tick" once it's finished its round of requests to GCS.