Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApolloGateway.stop() is not reliable #4428

Closed
glasser opened this issue Jul 29, 2020 · 1 comment · Fixed by apollographql/federation#452
Closed

ApolloGateway.stop() is not reliable #4428

glasser opened this issue Jul 29, 2020 · 1 comment · Fixed by apollographql/federation#452

Comments

@glasser
Copy link
Member

glasser commented Jul 29, 2020

All that ApolloGateway.stop() does is clear a pollingTimer if it's currently set.

But what if pollServices is currently running (but awaiting some async operation inside updateComposition) when this is happening? Then there are two problems:

  • pollServices may just end up calling pollServices again and schedule another call to itself, even though we thought we stopped it
  • stop returns while stuff is still happening, which is inappropriate

I think the right fix is to have an explicit stopped flag, and also have a "pollServices is running now" promise. Something like

function pollServices() {
  // assert this.pollServicesIsRunning is null
  this.pollServicesIsRunning = actualPollServices()
    .catch(e => console.log("error in actualPollServices"));
    .then(() => {
      this.pollServicesIsRunning = null;
      if (!this.stopped) this.pollTimer = setTimeout(() => this.pollServices(), TIMEOUT);
    });
}

async function actualPollServices() {
  // ... actually poll services; no need to sleep or reschedule
}

async function stop() {
  this.stopped = true;
  if (this.pollTimer) cancelTimeout(this.pollTimer);
  if (this.pollServicesIsRunning) await this.pollServicesIsRunning;
}

With a functional ApolloGateway.stop() invoked from ApolloServer.stop(), I think we can skip the unref too.

@glasser
Copy link
Member Author

glasser commented Jul 29, 2020

It's also worth noting that our docs don't appear to tell you to run ApolloGateway.stop() anywhere, and it might make sense if ApolloServer.stop() did this for you! But that's a separate issue.

glasser added a commit that referenced this issue Feb 8, 2021
Because AS is what invokes ApolloGateway.load, it should be its responsibility
to invoke the matching stop method.

apollographql/federation#452 (aimed at @apollo/gateway
0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it
makes calling `stop()` actually part of its expected API instead of something
you can ignore if you feel like it without affecting program shutdown. It has
a bit of a hack to still unref the timer if it looks like you're using an
old (pre-2.18) version of Apollo Server, but this PR (which will be released in
v2.20.0) will make ApolloServer stop the gateway for you.

Part of fixing #4428.
glasser added a commit that referenced this issue Feb 8, 2021
Because AS is what invokes ApolloGateway.load, it should be its responsibility
to invoke the matching stop method.

apollographql/federation#452 (aimed at @apollo/gateway
0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it
makes calling `stop()` actually part of its expected API instead of something
you can ignore if you feel like it without affecting program shutdown. It has
a bit of a hack to still unref the timer if it looks like you're using an
old (pre-2.18) version of Apollo Server, but this PR (which will be released in
v2.20.0) will make ApolloServer stop the gateway for you.

Part of fixing #4428.

Also simplify typings of toDispose set. 
`() => ValueOrPromise<void>` is a weird type because `void` means "I'm not going
to look at the return value" which is sorta incompatible with "but I need to see
if it's a Promise or not". So just make these all async (changing a couple
implementations by adding `async`).
glasser added a commit to apollographql/federation that referenced this issue Feb 9, 2021
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.

Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.

One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to `unref` the polling timeout to allow the Node process to exit if it's
the only thing left on the event loop, instead of encouraging users of
`ApolloGateway` to be responsible and call `stop()` themselves. While this may
be reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.

That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the `ApolloGateway` to be the
responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can
trigger events in plugins. So in the recently-released Apollo Server v2.20,
`ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in
most cases, the missing `unref` shouldn't keep the process running, as long as
you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts
of the server are keeping your process running).

What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility, `ApolloGateway` detects if it appears to be connected
to Apollo Server older than v2.18. If so, it continues to do the old
unref().  If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a `stop()` call should be pretty straightforward. (If we learn that this
change really is devastating, we can always go back to an unconditional
timer.unref() later.)

Fixes apollographql/apollo-server#4428.
This was referenced Mar 9, 2021
kodiakhq bot added a commit to ProjectXero/dbds that referenced this issue Mar 17, 2021
Bumps apollo-datasource from 0.7.2 to 0.7.3.

Changelog
Sourced from apollo-datasource's changelog.

CHANGELOG
The version headers in this history reflect the versions of Apollo Server itself.  Versions of other packages (e.g., those which are not actual HTTP integrations; packages not prefixed with "apollo-server", or just supporting packages) may use different versions.
🆕 Please Note!: 🆕 The @apollo/federation and @apollo/gateway packages now live in the apollographql/federation repository.

@apollo/gateway
@apollo/federation

vNEXT

The changes noted within this vNEXT section have not been released yet.  New PRs and commits which introduce changes should include an entry in this vNEXT section as part of their development.  With few exceptions, the format of the entry should follow convention (i.e., prefix with package name, use markdown backtick formatting for package names and code, suffix with a link to the change-set à la [PR #YYY](https://link/pull/YYY), etc.).  When a release is being prepared, a new header will be (manually) created below and the appropriate changes within that release will be moved into the new section.


apollo-server-core: The SIGINT and SIGTERM signal handlers installed by default (when not disabled by stopOnTerminationSignals: false) now stay active (preventing process termination) while the server shuts down, instead of letting a second signal terminate the process. The handlers still re-signal the process after this.stop() concludes. Also, if this.stop() throws, the signal handlers will now log and exit 1 instead of throwing an uncaught exception. [Issue #4931](apollographql/apollo-server#4931)
apollo-server-lambda: (UPDATE THIS MESSAGE BEFORE RELEASE; we are not sure if this actually helps nodejs14 compatibility or if it's just a nice refactor.) Support the nodejs14 runtime by changing the handler to be an async handler. (For backwards compatibility, if the handler receives a callback, it still acts like a non-async handler.) [Issue #1989](apollographql/apollo-server#1989) [PR #5004](apollographql/apollo-server#5004)

v2.21.1

apollo-server-lambda: The onHealthCheck option did not previously work. Additionally, health checks (with onHealthCheck or without) didn't work in all Lambda contexts, such as behind Custom Domains; the path check is now more flexible. [Issue #3999](apollographql/apollo-server#3999) [PR #4969](apollographql/apollo-server#4969) [Issue #4891](apollographql/apollo-server#4891) [PR #4892](apollographql/apollo-server#4892)
The debug option to new ApolloServer (which adds stack traces to errors) now affects errors that come from requests executed with server.executeOperation (and its wrapper apollo-server-testing), instead of just errors that come from requests executed over HTTP. [Issue #4107](apollographql/apollo-server#4107) [PR #4948](apollographql/apollo-server#4948)
Bump version of @apollographql/graphql-playground-html to v1.6.27 and @apollographql/graphql-playground-react to v1.7.39 to resolve incorrectly rendered CDN URL when Playground version was false-y.  [PR #4932](apollographql/apollo-server#4932) [PR #4955](apollographql/apollo-server#4955) [Issue #4937](apollographql/apollo-server#4937)

v2.21.0

Apollo Server can now be installed with graphql@15 without causing peer dependency errors or warnings. (Apollo Server has a file upload feature which was implemented as a wrapper around the graphql-upload package. We have been unable to upgrade our dependency on that package due to backwards-incompatible changes in later versions, and the version we were stuck on did not allow graphql@15 as a peer dependency. We have now switched to a fork of that old version called @apollographql/graphql-upload-8-fork that allows graphql@15.) Also bump the graphql-tools dependency from 4.0.0 to 4.0.8 for graphql@15 support. [Issue #4865](apollographql/apollo-server#4865)

v2.20.0

apollo-server: Previously, ApolloServer.stop() functioned like net.Server.close() in that it did not close idle connections or close active connections after a grace period. This meant that trying to await ApolloServer.stop() could hang indefinitely if there are open connections. Now, this method closes idle connections, and closes active connections after 10 seconds. The grace period can be adjusted by passing the new stopGracePeriodMillis option to new ApolloServer, or disabled by passing Infinity (though it will still close idle connections). Note that this only applies to the "batteries-included" ApolloServer in the apollo-server package with its own built-in Express and HTTP servers. [PR #4908](apollographql/apollo-server#4908) [Issue #4097](apollographql/apollo-server#4097)
apollo-server-core: When used with ApolloGateway, ApolloServer.stop now invokes ApolloGateway.stop. (This makes sense because ApolloServer already invokes ApolloGateway.load which is what starts the behavior stopped by ApolloGateway.stop.) Note that @apollo/gateway 0.23 will expect to be stopped in order for natural program shutdown to occur. [PR #4907](apollographql/apollo-server#4907) [Issue #4428](apollographql/apollo-server#4428)
apollo-server-core: Avoid instrumenting schemas for the old graphql-extensions library unless extensions are provided. [PR #4893](apollographql/apollo-server#4893) [Issue #4889](apollographql/apollo-server#4889)
apollo-server-plugin-response-cache@0.6.0: The shouldReadFromCache and shouldWriteToCache hooks were always documented as returning ValueOrPromise<boolean> (ie, that they could be either sync or async), but they actually only worked if they returned a bool. Now they can be either sync or async as intended. [PR #4890](apollographql/apollo-server#4890) [Issue #4886](apollographql/apollo-server#4886)
apollo-datasource-rest@0.10.0: The RESTDataSource.trace method is now protected instead of private to allow more control over logging and metrics. [PR #3940](apollographql/apollo-server#3940)

v2.19.2

apollo-server-express: types: Export ExpressContext from main module. [PR #4821](apollographql/apollo-server#4821) [Issue #3699](apollographql/apollo-server#3699)
apollo-server-env: types: The first parameter to fetch is now marked as required, as intended and in accordance with the Fetch API specification. [PR #4822](apollographql/apollo-server#4822) [Issue #4741](apollographql/apollo-server#4741)
apollo-server-core: Update graphql-tag package to latest, now with its graphql-js peerDependencies expanded to include ^15.0.0 [PR #4833](apollographql/apollo-server#4833)

v2.19.1

apollo-server-core: The debugPrintReports option to ApolloServerPluginUsageReporting now prints traces as well. [PR #4805](apollographql/apollo-server#4805)

v2.19.0

apollo-server-testing: types: Allow generic variables usage of query and mutate functions. [PR #4383](apollograpqh/apollo-server#4383)
apollo-server-express: Export the GetMiddlewareOptions type. [PR #4599](apollograpqh/apollo-server#4599)
apollo-server-lambda: Fix file uploads - ignore base64 decoding for multipart queries. [PR #4506](apollographql/apollo-server#4506)
apollo-server-core: Do not send  operation documents that cannot be executed to Apollo Studio. Instead, information about these operations will be combined into one "operation" for parse failures, one for validation failures, and one for unknown operation names.



... (truncated)


Commits

c212627 Release
See full diff in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant