Skip to content

Commit

Permalink
@apollo/gateway: Make ApolloGateway.stop() reliable and required (#452)
Browse files Browse the repository at this point in the history
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.

Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.

One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to `unref` the polling timeout to allow the Node process to exit if it's
the only thing left on the event loop, instead of encouraging users of
`ApolloGateway` to be responsible and call `stop()` themselves. While this may
be reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.

That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the `ApolloGateway` to be the
responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can
trigger events in plugins. So in the recently-released Apollo Server v2.20,
`ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in
most cases, the missing `unref` shouldn't keep the process running, as long as
you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts
of the server are keeping your process running).

What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility, `ApolloGateway` detects if it appears to be connected
to Apollo Server older than v2.18. If so, it continues to do the old
unref().  If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a `stop()` call should be pretty straightforward. (If we learn that this
change really is devastating, we can always go back to an unconditional
timer.unref() later.)

Fixes apollographql/apollo-server#4428.
  • Loading branch information
glasser authored Feb 9, 2021
1 parent f75fbce commit 8201e0a
Show file tree
Hide file tree
Showing 5 changed files with 231 additions and 48 deletions.
1 change: 1 addition & 0 deletions gateway-js/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

> The changes noted within this `vNEXT` section have not been released yet. New PRs and commits which introduce changes should include an entry in this `vNEXT` section as part of their development. When a release is being prepared, a new header will be (manually) created below and the appropriate changes within that release will be moved into the new section.
- **If you are on v2.18 or v2.19 of Apollo Server, you should upgrade to Apollo Server v2.20 before upgrading to this version**, or your Node process may not shut down properly after stopping your Apollo Server.. Code that calls `ApolloGateway.load` is now expected to call `ApolloGateway.stop`. If you don't do that and you're using managed federation or `experimental_pollInterval`, the background polling interval will now keep a Node process alive rather than allowing it to exit if it's the only remaining event loop handler. Generally, `ApolloServer` is what calls `ApolloGateway.load`, and if you use at least v2.20.0 of Apollo Server, `ApolloServer.stop()` will invoke `ApolloGateway.stop()`. There's a bit of a hack where ApolloGateway does the old behavior if it believes that it is being called by a version of Apollo Server older than v2.18.0. So if you are manually calling `ApolloGateway.load` from your code, make sure to call `ApolloGateway.stop` when you're done, and don't use this version with Apollo Server v2.18 or v2.19. [PR #452](https://github.com/apollographql/federation/pull/452) [apollo-server Issue #4428](https://github.com/apollographql/apollo-server/issues/4428)
- Simplify startup code paths. This is technically only intended to be an internal restructure, but it's substantial enough to warrant a changelog entry for observability in case of any unexpected behavioral changes. [PR #440](https://github.com/apollographql/federation/pull/440)

## v0.22.0
Expand Down
9 changes: 8 additions & 1 deletion gateway-js/src/__tests__/gateway/lifecycle-hooks.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ describe('lifecycle hooks', () => {

expect(experimental_updateServiceDefinitions).toBeCalled();
expect(gateway.schema!.getType('Furniture')).toBeDefined();
await gateway.stop();
});

it('calls experimental_didFailComposition with a bad config', async () => {
Expand Down Expand Up @@ -181,6 +182,8 @@ describe('lifecycle hooks', () => {
// second call should have previous info in the second arg
expect(secondCall[1]!.schema).toBeDefined();
expect(secondCall[1]!.compositionMetadata!.schemaHash).toEqual('hash1');

await gateway.stop();
});

it('uses default service definition updater', async () => {
Expand All @@ -196,6 +199,8 @@ describe('lifecycle hooks', () => {
// updater, it has to use the default. If there's a valid schema, then
// the loader had to have been called.
expect(schema.getType('User')).toBeDefined();

await gateway.stop();
});

it('warns when polling on the default fetcher', async () => {
Expand Down Expand Up @@ -229,11 +234,12 @@ describe('lifecycle hooks', () => {
const schemaChangeCallback = jest.fn(() => resolve());

gateway.onSchemaChange(schemaChangeCallback);
gateway.load();
await gateway.load();

await schemaChangeBlocker;

expect(schemaChangeCallback).toBeCalledTimes(1);
await gateway.stop();
});

it('calls experimental_didResolveQueryPlan when executor is called', async () => {
Expand Down Expand Up @@ -261,5 +267,6 @@ describe('lifecycle hooks', () => {
});

expect(experimental_didResolveQueryPlan).toBeCalled();
await gateway.stop();
});
});
13 changes: 10 additions & 3 deletions gateway-js/src/__tests__/integration/configuration.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,15 @@ beforeEach(() => {
});

describe('gateway configuration warnings', () => {
let gateway: ApolloGateway | null = null;
afterEach(async () => {
if (gateway) {
await gateway.stop();
gateway = null;
}
});
it('warns when both csdl and studio configuration are provided', async () => {
const gateway = new ApolloGateway({
gateway = new ApolloGateway({
csdl: getTestingCsdl(),
logger,
});
Expand All @@ -69,7 +76,7 @@ describe('gateway configuration warnings', () => {
it('conflicting configurations are warned about when present', async () => {
mockSDLQuerySuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceList: [{ name: 'accounts', url: service.url }],
logger,
});
Expand All @@ -92,7 +99,7 @@ describe('gateway configuration warnings', () => {
mockImplementingServicesSuccess(service);
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
logger,
});

Expand Down
23 changes: 14 additions & 9 deletions gateway-js/src/__tests__/integration/networkRequests.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ const updatedService: MockService = {

let fetcher: typeof fetch;
let logger: Logger;
let gateway: ApolloGateway | null = null;

beforeEach(() => {
if (!nock.isActive()) nock.activate();
Expand All @@ -92,16 +93,20 @@ beforeEach(() => {
};
});

afterEach(() => {
afterEach(async () => {
expect(nock.isDone()).toBeTruthy();
nock.cleanAll();
nock.restore();
if (gateway) {
await gateway.stop();
gateway = null;
}
});

it('Queries remote endpoints for their SDLs', async () => {
mockSDLQuerySuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceList: [{ name: 'accounts', url: service.url }],
logger,
});
Expand All @@ -116,7 +121,7 @@ it('Extracts service definitions from remote storage', async () => {
mockImplementingServicesSuccess(service);
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({ logger });
gateway = new ApolloGateway({ logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -163,7 +168,7 @@ it.skip('Rollsback to a previous schema when triggered', async () => {
.mockImplementationOnce(() => secondResolve())
.mockImplementationOnce(() => thirdResolve());

const gateway = new ApolloGateway({ logger });
gateway = new ApolloGateway({ logger });
// @ts-ignore for testing purposes, a short pollInterval is ideal so we'll override here
gateway.experimental_pollInterval = 100;

Expand Down Expand Up @@ -204,7 +209,7 @@ it(`Retries GCS (up to ${GCS_RETRY_COUNT} times) on failure for each request and
failNTimes(GCS_RETRY_COUNT, () => mockRawPartialSchema(service));
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({ fetcher, logger });
gateway = new ApolloGateway({ fetcher, logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -255,7 +260,7 @@ describe('Downstream service health checks', () => {
mockSDLQuerySuccess(service);
mockServiceHealthCheckSuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
logger,
serviceList: [{ name: 'accounts', url: service.url }],
serviceHealthCheck: true,
Expand Down Expand Up @@ -293,7 +298,7 @@ describe('Downstream service health checks', () => {

mockServiceHealthCheckSuccess(service);

const gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
gateway = new ApolloGateway({ serviceHealthCheck: true, logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -352,7 +357,7 @@ describe('Downstream service health checks', () => {
.mockImplementationOnce(() => resolve1())
.mockImplementationOnce(() => resolve2());

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceHealthCheck: true,
logger,
});
Expand Down Expand Up @@ -396,7 +401,7 @@ describe('Downstream service health checks', () => {
let resolve: () => void;
const schemaChangeBlocker = new Promise((res) => (resolve = res));

const gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
// @ts-ignore for testing purposes, a short pollInterval is ideal so we'll override here
gateway.experimental_pollInterval = 100;

Expand Down
Loading

0 comments on commit 8201e0a

Please sign in to comment.