-
Notifications
You must be signed in to change notification settings - Fork 169
chore(rfc): operation cache warmer #1115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
115d1b7
chore(rfc): distributed operation cache
StarpTech 0a709eb
chore: add code highlight
StarpTech ac09b1c
chore: add comment
StarpTech 4db8584
chore: improve
StarpTech 1f5a7e1
chore: improve
StarpTech 5eb656f
chore: improve
StarpTech 90d8a1a
chore: improve
StarpTech 675b5d7
chore: improve
StarpTech b970a75
chore: improve
StarpTech b84bf31
chore: improve
StarpTech db443c8
chore: improve
StarpTech cf19728
chore: improve
StarpTech 9960125
chore: correct typo
StarpTech aabc645
chore: improve
StarpTech e890cfe
chore: add example
StarpTech bc988b7
chore: add example
StarpTech 076fb7f
chore: fix typo
StarpTech d3e5984
chore: rename rfc
StarpTech 27a3604
chore: address feedback
StarpTech File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
--- | ||
title: "Operation Cache Warmer" | ||
author: Dustin Deus | ||
date: 2024-08-25 | ||
status: Draft | ||
--- | ||
|
||
# Operation Cache Warmer | ||
|
||
- **Author:** Dustin Deus | ||
- **Date:** 2024-08-25 | ||
- **Status:** Draft | ||
|
||
## Abstract | ||
|
||
This RFC describes a new feature to reduce the latency of the system by pre-planning the most expensive and requested operations before the router accepts traffic. We achieve this by computing the Top-N GraphQL operations available and making them available to all routers instances before they accept traffic. | ||
|
||
## Motivation | ||
|
||
GraphQL is a powerful tool to query data from a server. However, the flexibility of the query language comes with a cost. The cost is the complexity of the query and how expensive it is to normalize, plan and execute it. While execution performance is primarily a concern of the underlying subgraphs, the planning phase can be a unpredictable and significant latency contributor. The operation cache warmer aims to reduce this latency by pre-planning the most expensive and requested operations ahead to make it invisible to the user. | ||
|
||
# Proposal | ||
|
||
The distributed operation cache is semi-automatic and allows the user to push specific operations to the cache but also automatically computes the most expensive and requested operations of the last time frame (configurable). The cache has a fixed size of operations e.g. 100 (configurable) and is shared across all router instances. An operation can be a regular query, subscription, mutation or persisted operation. When the cache capacity is reached, manual operations have a higher priority than automatic operations. This allows users to manage the priority of operations in the cache themselves. It is possible that operations aren't compatible with all future schema changes. In that case, the operation is removed from the cache. | ||
|
||
### Pushing operations to the cache | ||
|
||
The User can push individual operations to the operation cache by using the CLI: | ||
|
||
```bash | ||
wgc federated-graph operation-cache add --graph mygraph --file operations.json | ||
StarpTech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
The CLI command will add the operations from the file `operations.json` to the operation cache of the graph `mygraph`. The file must contain a list of operations in JSON format. The operations can be queries, subscriptions, mutations or persisted operations. | ||
|
||
```json5 | ||
[ | ||
// Queries | ||
{ | ||
"body": "query { ... }" | ||
}, | ||
// Persisted operation | ||
{ | ||
"sha256Hash": "1234567890", | ||
"body": "query { ... }", | ||
} | ||
] | ||
``` | ||
|
||
The cli command is idempotent and always updates the cache with the latest operations. This doesn't trigger the computation of the Top-N operations which is done periodically by the Cosmo Platform. | ||
|
||
### Automatic operation computation | ||
|
||
At the same time, WunderGraph Cosmo is analyzing the incoming traffic based on the OpenTelemetry metrics that each router is sending. The Cosmo Platform computes the Top-N operations for each graph and combines it with the manually added operations. The Top-N operations are then pushed to the operation cache of the graph. | ||
|
||
### Top-N computation | ||
|
||
The Top-N computation is based on the following metrics: | ||
|
||
- Total operation pre-execution time: Normalization, Validation, Planning | ||
- Total request count | ||
|
||
The Top-N computation is done for a specific time interval e.g. 3-72 hour (configurable). The operations are sorted by the pre-execution time and request count. The Top-N operations are then pushed to the operation cache. Manual operations have a higher priority than automatic operations. This means when the cache capacity is reached, manual operations are moved to the cache first and automatic operations are removed. | ||
StarpTech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Example | ||
|
||
The following example shows the Top-5 operations of a graph. The cache capacity is 5. The operations are sorted by the total pre-execution time and request count in descending order. There are three slots left in the cache where the Cosmo Platform can add automatic operations based on the Top-N computation. | ||
|
||
``` | ||
Operation A: 400ms, 1000 requests (Manual added) | ||
Operation B: 300ms, 500 requests (Automatic slot) | ||
Operation C: 200ms, 200 requests (Automatic slot) | ||
Operation D: 100ms, 100 requests (Manual added) | ||
Operation E: 50ms, 50 requests (Automatic slot) | ||
``` | ||
|
||
Alternatively, the user can add three more manual operations to the cache until the cache capacity is reached. This has the effect that no automatic operations can be added to the cache. In that case, we assume that the user knows better which operations are important. | ||
|
||
### Cache update process | ||
|
||
The router checks periodically e.g. every 5min for updates of the operation cache. The cache is checked explicitly when the router starts and when the schema changes. The cache is loaded and all operations are pre-planned before the router accepts traffic. The cache is updated in the background and doesn't block the router from accepting traffic. | ||
|
||
### Platform integration | ||
|
||
For containerized environments like Kubernetes, users should use the readiness probe to ensure that the router is ready to accept traffic. Setting not to small values for the readiness probe timeout is recommended to ensure that the router has enough time to prepare the cache. For schema updates after startup, this process is non-blocking because the new graph schema isn't swapped until the cache is warmed up. | ||
StarpTech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Cosmo UI integration | ||
|
||
A User can disable the operation cache in the Cosmo UI. The User can see the current operations in the cache and remove them if necessary. The User can also see the current status of the cache and the last computation time. | ||
StarpTech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Triggering the computation manually | ||
|
||
A User is able to trigger the computation of the Top-N operations manually in the Cosmo UI. This is useful for debugging purposes. | ||
|
||
## Router configuration | ||
|
||
The operation cache can be enabled or disabled in the router configuration file. The default is enabled. A valid Graph API key is required to fetch the operations cache from the Cosmo Platform. | ||
|
||
```yaml | ||
version: "1" | ||
|
||
cache_warmup: | ||
enabled: true | ||
interval: 5m | ||
``` | ||
|
||
_For this RFC, we only consider support for the WunderGraph Cosmo CDN._ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.