Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Automatic Persisted Queries #187

Merged
merged 41 commits into from
Jul 11, 2020
Merged

Add Support For Automatic Persisted Queries #187

merged 41 commits into from
Jul 11, 2020

Conversation

pscarey
Copy link
Collaborator

@pscarey pscarey commented Jun 16, 2020

Adds customisable support for Automatic Persisted Queries, with default settings compatible with apollo-client.

#110 already has discussion on this. We required support internally, so it's up to maintainers if support in this way is appropriate. Happy to make tweaks based on feedback if required.

Note this builds on #186.

@mcollina
Copy link
Collaborator

I'm not convinced by this implementation. How do you plan to scale this horizontally? The queries will be stored only on one peer.

@pscarey
Copy link
Collaborator Author

pscarey commented Jun 17, 2020

Interesting point - I think this is how Apollo Server implemented it, which is why I went down this road, then generalised the way the query is delivered.

The argument in favour would be that the cost of hashing a query the first time an instance sees it will still massively reduce the number of times the full query is sent over the wire.


Really, this entire interface or setting might be better off merged into the existing persisted queries solution, in a customisable manner, since there is no firm 'Best Practice', such that there is:

Persisted Query Settings:

  • isPersistedQuery check if the given GraphQL request/query is persisted or regular.
  • getHash get the hash from a GraphQL persisted query request, or falsy if not found.
  • getQueryFromHash return the query for a given hash, or falsy if not found.
  • getHashForQuery return the hash for a given query, or falsy if not to be stored.
  • saveQuery store the query for the given hash.
  • onlyPersisted whether to allow persisted queries or not.

Example implementations would be:

Current Persisted Queries (Ahead of time):

{
  isPersistedQuery: (r) => r.persisted,
  getHash: (r) => r.query,
  getQueryFromHash: (hash) => persistedQueries[hash], // Load from memory
  // getHashForQuery and saveQuery are not required as we do not save new queries
}

Apollo Client Compatible Automatic Persisted Queries (Runtime):

{
  isPersistedQuery: (r) => r.extensions.persistedQuery.sha256Hash && r.extensions.persistedQuery.version === 1,
  getHash: (r) => r.extensions.persistedQuery.sha256Hash ,
  getQueryFromHash: (hash) => persistedQueries[hash], // Load from memory
  getHashForQuery: (query) => sha256(query),
  saveQuery: (hash, query) => persistedQueries[hash] = query // Store in memory
}

Cached Apollo Client Compatible Automatic Persisted Queries (Runtime + Shared Cache):

{
  isPersistedQuery: (r) => r.extensions.persistedQuery.sha256Hash && r.extensions.persistedQuery.version === 1,
  getHash: (r) => r.extensions.persistedQuery.sha256Hash ,
  getQueryFromHash: (hash) => redis.get(hash), // Load from redis
  getHashForQuery: (query) => sha256(query),
  saveQuery: (hash, query) => redis.set(hash, query) // Store in redis
}

@mcollina
Copy link
Collaborator

My point is that the current interface (ahead of time) cannot be made dynamic without adding support for multiple instances of the server that store the query. Otherwise subsequent requests that hit a different server would not have the cached query.

In the code in this PR the “get” operation is synchronous and it should be made async (and likely moved in a common interface). In your list, the Persisted Query Settings:

  • isPersistedQuery check if the given GraphQL request/query is persisted or regular.
  • getHash get the hash from a GraphQL persisted query request, or falsy if not found.
  • getQueryFromHash return the query for a given hash, or falsy if not found. must be async
  • getHashForQuery return the hash for a given query, or falsy if not to be stored.
  • saveQuery store the query for the given hash. must be async
  • onlyPersisted whether to allow persisted queries or not.

I’m also skeptical that adding a delay of a round trip to Redis is going to improve latency much in practice. We are trading the cost of sending a few hundred bytes from the client to fetching those from Redis. Given how HTTP/1.1 keepalive and TCP work, sending a few hundreds bytes more should have no impact.
The risk is to have an ahead-of-line blocking on the redis connection which will make thus caching moot.

Notably this makes a lot of sense if our query is massive.

@pscarey
Copy link
Collaborator Author

pscarey commented Jun 18, 2020

I agree with your reasoning that it'd need to be async.

As a general rule, any persisted query requires that the payload reduction savings are greater than the lookup delay. How big the lookup delay is depends on the implementation.

My understanding is that:

Ahead of Time Persisted Queries (AoTPQ):

  • Best performance in every use case.
  • Hardest to set up, as all queries need to be compiled into hashes ahead of time.
  • Maintaining support for old client versions (i.e. old queries) is non trivial and requires a very specific development workflow.

Automatic Persisted Queries (APQ):

  • Using local memory will be slow for each of their first query, but then give equal performance compared to AoTPQ afterwards.
  • The performance of this is basically equivalent to AoTPQ for long lived servers (i.e. not cloud functions).
  • This is also the easiest to set up.
  • Works well for unknown query structures, i.e. public APIs.

Cached APQ:

  • Remove that first overhead at the cost of a roundtrip to the cache on every request.
  • The performance for this would be worse than APQ for long lived servers, but better than APQ for short lived servers (cloud functions).
  • The performance of this is heavily caveated on the payload reduction being sufficiently large to counteract the cost of the lookup time. However, it can reasonably be assumed than an in-datacenter Redis connection is much faster than e.g. a 3G mobile connection. Really this means it boils down to the use case, and the performance of the client.

Appreciate the responsiveness here, but I also want to flag that in order for us to migrate to fastify-gql, we need to support APQ as otherwise we'll lose backwards compatibility - obviously that doesn't mean that it's something fastify-gql should support - so if it's a no-go, we can fork.

@mcollina
Copy link
Collaborator

Could you confirm that you are using "sticky sessions" for APQ? How can you guarantee that the same client goes to the same server otherwise? What i do not understand about APQ is if it's actually feasible in production. It seems very likely that a client talking to a different server might not get the cached one.

I agree with your reasoning that it'd need to be async.

Then we are good, can you update the PR?


Appreciate the responsiveness here, but I also want to flag that in order for us to migrate to fastify-gql, we need to support APQ as otherwise we'll lose backwards compatibility - obviously that doesn't mean that it's something fastify-gql should support - so if it's a no-go, we can fork.

I'm cautious about adding feature, I think this is legit and we should add it in one form or another.

@pscarey
Copy link
Collaborator Author

pscarey commented Jun 18, 2020

I will do updates to both PR's tomorrow.

Could you confirm that you are using "sticky sessions" for APQ? How can you guarantee that the same client goes to the same server otherwise? What i do not understand about APQ is if it's actually feasible in production. It seems very likely that a client talking to a different server might not get the cached one.

Our use case is as follows:

Client:

  • Mobile App running apollo-client with approx 20 unique queries.
  • Prior versions of the mobile app are supported, but we haven't tracked legacy queries (i.e. they are no longer in our master branch, but are still valid queries due to no breaking changes).
  • apollo-client attempts to issue a query using the persisted queries hash. Expected API responses are as per apollo-link-persisted-query.
  • If the client receives the appropriate PersistedQueryNotFound response, then it resends the whole query. This causes the APQ cache to be populated for future requests.

Server:

  • 3-4 replicas of our GraphQL API running in K8s.
  • Queries are persisted when they are received in full, in local memory.
  • After a short amount of usage, the most common queries will all have been persisted.
  • After a little while longer, less frequently called queries are persisted.
  • The machine continues its lifetime for a long period after, directly replying to persisted queries.

I hope this (very rough) diagram can covey the idea slightly better:
Screen Shot 2020-06-18 at 10 01 22 pm

Sticky sessions would ensure that the cache of the machine returning PersistedQueryNotFound is the one populated by the query, but are not strictly necessary, as the queries will still be persisted in a short period of time compared to the lifetime of the machine.

@mcollina
Copy link
Collaborator

Thanks, it makes sense now. Please add a short version of that in the README.

@pscarey
Copy link
Collaborator Author

pscarey commented Jun 20, 2020

I've reworked the API & tidied things up a lot based on the conversation above.

  • The proposed API gives defaults which align with the existing options, support for Apollo's APQ, and room for additional customisation.
  • Documentation and types are updated.
  • All persisted query tests have been moved into a seperate file.

@mcollina
Copy link
Collaborator

Can you rebase this on top of master?

@pscarey
Copy link
Collaborator Author

pscarey commented Jun 24, 2020

Have merged the other changes in, diff should be clean.

Thanks.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
lib/routes.js Outdated Show resolved Hide resolved
@pscarey
Copy link
Collaborator Author

pscarey commented Jun 27, 2020

Minor fixes have been completed. Don't mind the long process - happy to get the prompt feedback, and I appreciate it's a non trivial set of changes.

Should other casing also be updated?

  • Prepared -> prepared
  • PreparedOnly -> preparedOnly
  • Automatic -> automatic

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated
* `onlyPersisted`: Boolean. Flag to control whether to allow graphql queries other than persisted. When `true`, it'll make the server reject any queries that are not present in the `persistedQueries` option above. It will also disable any ide available (playground/graphiql).
* `persistedQueries`: A hash/query map to resolve the full query text using it's unique hash. Overrides `persistedQuerySettings`.
* `onlyPersisted`: Boolean. Flag to control whether to allow graphql queries other than persisted. When `true`, it'll make the server reject any queries that are not present in the `persistedQueries` option above. It will also disable any ide available (playground/graphiql). Requires `persistedQueries` to be set, and overrides `persistedQuerySettings`.
* `persistedQuerySettings`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persistedQueryProvider maybe?

index.d.ts Outdated Show resolved Hide resolved
lib/persistedQueryDefaults.js Show resolved Hide resolved
lib/routes.js Outdated Show resolved Hide resolved
lib/routes.js Show resolved Hide resolved
@mcollina
Copy link
Collaborator

mcollina commented Jul 6, 2020

@pscarey any updates on this? It would be great to land!

@pscarey
Copy link
Collaborator Author

pscarey commented Jul 8, 2020

@mcollina Sorry about the delay - was away for a few days. Will be updating shortly.

@pscarey
Copy link
Collaborator Author

pscarey commented Jul 8, 2020

All updated. FYI, we've been running this in production for over a week, and have had no issues.

This was referenced Jul 8, 2020
lib/routes.js Show resolved Hide resolved
lib/routes.js Outdated Show resolved Hide resolved
lib/routes.js Outdated Show resolved Hide resolved
@pscarey
Copy link
Collaborator Author

pscarey commented Jul 9, 2020

Have implemented the feedback, but it looks like some changes from master are breaking CI?

@mcollina
Copy link
Collaborator

mcollina commented Jul 9, 2020

Yes, we are fixing them today, unfortunately something broke in #147

@mcollina
Copy link
Collaborator

mcollina commented Jul 9, 2020

I have the reverted the offending commit on master.

lib/routes.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina merged commit 172ee6d into mercurius-js:master Jul 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants