Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gateway): Prevent inoperable state on initial failure to load configuration #4277

Merged
merged 14 commits into from
Jul 27, 2020

Conversation

trevor-scheer
Copy link
Member

In its current state, if the gateway fails to compose successfully in managed mode, it will continue to exist in a state where polling never kicked off and it can't serve requests successfully.

After these changes, the gateway will:

  1. In managed mode, still kick off polling and continue to try to load config on the poll interval if it fails the first time.
  2. In unmanaged mode and with polling disabled, it will process.exit() if it fails to load.

I'm not sure why I had marked this as a TODO and future
breaking change. This change simplifies the `updateComposition`
API and makes `load` (called once ever) responsible for setting
config.
This commit introduces behavior such that the gateway
will `process.exit(1)` in the event that it isn't polling
and fails to compose a valid schema on startup.
@trevor-scheer trevor-scheer changed the title fix(gateway): Prevent inoperable state on initial failure to load configuration WIP - fix(gateway): Prevent inoperable state on initial failure to load configuration Jun 16, 2020
@glasser
Copy link
Member

glasser commented Jun 16, 2020

(not doing a full review unless requested, just curious)

Does this state integrate with the Apollo Server built in health check at all (by default or opt in)? It really seems like it would be nice to be able to not consider a gateway server healthy until it has loaded its schema.

@abernix
Copy link
Member

abernix commented Jun 17, 2020

@glasser We aren't changing the existing default behavior of the AS server health check, but we can provide a pre-written function that implements it. That would be opt-in via documentation suggestion for now, and then a default mode of operation in AS3.

@glasser
Copy link
Member

glasser commented Jun 17, 2020

Great, I think we definitely want that for our own use. Let me know if that's something your team is working on or if we should write it.

@abernix
Copy link
Member

abernix commented Jun 17, 2020

@glasser Thoughts on this?
I wrote this some months back, but I think it works well, still? https://gist.github.com/abernix/50de4ce87278a3d8c23bd3d174ae0b6c

@glasser
Copy link
Member

glasser commented Jun 17, 2020

That sounds good though I was actually thinking of just starting with the much simpler "if we have never successfully loaded the schema, unhealthy, otherwise healthy" (which honestly seems reasonable to me to be on by default, but I trust that your generally more conservative nature around these sorts of changes is based on far more experience with this project than mine).

Ie base it on "can we even conceivably run any GraphQL query", not "at the moment are backends happy".

Base automatically changed from master to main June 24, 2020 18:18
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is WIP, but I thought I'd leave some thoughts about it since from what I understand, this has fixed the Gateway startup problem we've seen internally when there is downstream service unavailability! Put another way, I'm excited to ship it!

Comment on lines +33 to +48
let logger: Logger;

beforeEach(() => {
const warn = jest.fn();
const debug = jest.fn();
const error = jest.fn();
const info = jest.fn();

logger = {
warn,
debug,
error,
info,
};
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also been repeating this pattern in many places. Perhaps at some point soon (not now), we should just make a spyableLogger?

packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
packages/apollo-gateway/src/index.ts Outdated Show resolved Hide resolved
@trevor-scheer trevor-scheer changed the title WIP - fix(gateway): Prevent inoperable state on initial failure to load configuration fix(gateway): Prevent inoperable state on initial failure to load configuration Jul 27, 2020
@trevor-scheer trevor-scheer merged commit 16b7884 into main Jul 27, 2020
@trevor-scheer trevor-scheer deleted the trevor/gateway-exit-correctly branch July 27, 2020 20:38

if (isManagedConfig(this.config) || this.experimental_pollInterval) {
if (!this.pollingTimer) this.pollServices();
await this.updateComposition();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this PR accomplishes its goal. If this call throws, no polling will happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that even once this is fixed, there are a couple issues to resolve:

  • We need special logic to ensure that the serverWillStart plugins get called, if they got skipped because the original schemaDerivedData threw
  • The Gateway successfully loaded schema message in load won’t ever show up if the first update failed. It might make sense to move that log line into updateComposition from load, eg putting it between assigning to this.schema and notifying the listeners, if !previousSchema

abernix pushed a commit to apollographql/federation that referenced this pull request Sep 4, 2020
…figuration (apollographql/apollo-server#4277)

In managed mode, kick off polling and continue to try to load config on the
poll interval even if it fails the first time.
Apollo-Orig-Commit-AS: apollographql/apollo-server@16b7884
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants