usage reporting plugin: add fieldLevelInstrumentation option (#5963)

The usage reporting plugin already has an `includeRequest` option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, `includeRequest: false` is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, `fieldLevelInstrumentation`, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page Apollo Server now sends both an "observed execution count" and "estimated execution count" for each field (for each operation and client). The former is literally how many times we saw the field get executed (only during operations where `fieldLevelInstrumentation` returned truthy). If the hook only returns true or false, the latter is the same, but you may also return a number (typically either 0 or at least 1) which represents a weight for that operation; the "estimated execution count" will be incremented by that number instead of by 1. So for example, with: fieldLevelInstrumentation: () => Math.random() < 0.01 ? 1/0.01 : false Apollo Server will instrument 1% of operations, and the "estimated execution count" will be 100 times more than the observed execution count. (You can imagine more sophisticated implementations of `fieldLevelInstrumentation` which sample more common operations more aggressively than rare operations.) If you pass a number for `fieldLevelInstrumentation`, it is equivalent to passing a function of the form in the above example; that is, the previous example behaves identically to `fieldLevelInstrumentation: 0.01`. The `latency_count` sent with field stats (which powers per-field timing hints) is now always scaled by the given fact (ie, there's no separate "observed histogram"). Note that the semantics of the `requestContext.metrics.captureTraces` field changes with this PR. Previously its value matched the value returned by the `includeRequest` hook; now it matches the truthiness of the value returned by the `fieldLevelInstrumentation` hook. This field determines whether Apollo Gateway includes the `apollo-federation-include-trace` header with outgoing requests so this is an appropriate change. Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with fieldLevelInstrumentation always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has fieldLevelInstrumentation:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.
apollographql · Dec 28, 2021 · 294a8ed · 294a8ed
1 parent 26030d1
commit 294a8ed
Show file tree

Hide file tree

Showing 20 changed files with 839 additions and 177 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,7 +10,7 @@ The version headers in this history reflect the versions of Apollo Server itself
 ## vNEXT (minor!)
 
 - `apollo-server-core`: Usage reporting no longer sends a "client reference ID" to Apollo Studio (along with the client name and client version). This little-used feature has not been documented [since 2019](https://github.com/apollographql/apollo-server/pull/3180) and is currently entirely ignored by Apollo Studio. This is technically incompatible as the interface `ClientInfo` no longer has the field `clientReferenceId`; if you were one of the few users who explicitly set this field and you get a TypeScript compilation failure upon upgrading to v3.6.0, just stop using the field. [PR #5890](https://github.com/apollographql/apollo-server/pull/5890)
-- `apollo-server-core`: Preliminary support for referenced field reporting. THIS ENTRY NEEDS TO BE EXPANDED BEFORE THE v3.6.0 RELEASE. [Issue #5708](https://github.com/apollographql/apollo-server/issues/5708) [PR #5956](https://github.com/apollographql/apollo-server/pull/5956)
+- `apollo-server-core`: Preliminary support for referenced field reporting. THIS ENTRY NEEDS TO BE EXPANDED BEFORE THE v3.6.0 RELEASE. Include description of fieldLevelInstrumentation and the ability to avoid instrumentation. Include effect on `metrics` object. [Issue #5708](https://github.com/apollographql/apollo-server/issues/5708) [PR #5956](https://github.com/apollographql/apollo-server/pull/5956)
 - `apollo-server-core`: Remove dependency on `apollo-graphql` package (by inlining the code which generates usage reporting signatures). That package has not yet been published with a `graphql@16` peer dependency, so Apollo Server v3.5 did not fully support `graphql@16` without overriding peer dependencies. [Issue #5941](https://github.com/apollographql/apollo-server/issues/5941) [PR #5955](https://github.com/apollographql/apollo-server/pull/5955)
 
 ## v3.5.0

diff --git a/docs/source/api/plugin/usage-reporting.md b/docs/source/api/plugin/usage-reporting.md
@@ -105,6 +105,38 @@ The only properties of the reported error you can modify are its `message` and i
 <tr>
 <td>
 
+###### `fieldLevelInstrumentation`
+
+`async Function` or `number`
+</td>
+<td>
+
+This option allows you to choose if Apollo Server should calculate detailed per-field statistics for a particular request. It is only called for executable operations: operations which parse and validate properly and which do not have an unknown operation name. It is not called if an [`includeRequest`](#includerequest) hook is provided and returns false.
+
+You can either pass an async function or a number. The function receives a `GraphQLRequestContext`. (The effect of passing a number is described later.) Your function can return a boolean or a number; returning false is equivalent to returning 0 and returning true is equivalent to returning 1.
+
+Returning false (or 0) means that Apollo Server will only pay attention to overall properties of the operation, like what GraphQL operation is executing and how long the entire operation takes to execute, and not anything about field-by-field execution.
+
+If you return false (or 0), this operation *will* still contribute to most features of Studio, such as schema checks, the Operations page, and the "referencing operations" statistic on the Fields page, etc.
+
+If you return false (or 0), this operation will *not* contribute to the "field executions" statistic on the Fields page or to the execution timing hints optionally displayed in Studio Explorer or in vscode-graphql. Additionally, this operation will not produce a trace that can be viewed on the Traces section of the Operations page.
+
+Returning false (or 0) for some or all operations can improve your server's performance, as the overhead of calculating complete traces is not always negligible. This is especially the case if this server is an Apollo Gateway, as captured traces are transmitted from the subgraph to the Gateway in-band inside the actual GraphQL response.
+
+Returning a positive number means that Apollo Server will track each field execution and send Apollo Studio statistics on how many times each field was executed and what the per-field performance was. Apollo Server sends both a precise observed execution count and an estimated execution count. The former is calculated by counting each field execution as 1, and the latter is calculated by counting each field execution as the number returned from this hook, which can be thought of as a weight.
+
+Passing a number `x` (which should be between 0 and 1 inclusive) for `fieldLevelInstrumentation` is equivalent to passing the function `async () => Math.random() < x ? 1/x : 0`. For example, if you pass 0.01, then 99% of the time this function will return 0, and 1% of the time this function will return 100. So 99% of the time Apollo Server will not track field executions, and 1% of the time Apollo Server will track field executions and send them to Apollo Studio both as an exact observed count and as an "estimated" count which is 100 times higher. Generally, the weights you return should be roughly the reciprocal of the probability that the function returns non-zero; however, you're welcome to craft a more sophisticated function, such as one that uses a higher probability for rarer operations and a lower probability for more common operations.
+
+(Note that returning true here does *not* mean that the data derived from field-level instrumentation must be transmitted to Apollo Studio's servers in the form of a trace; it may still be aggregated locally to statistics. But either way this operation will contribute to the "field executions" statistic and timing hints.)
+
+The default `fieldLevelInstrumentation` is a function that always returns true.
+
+</td>
+</tr>
+
+<tr>
+<td>
+
 ###### `includeRequest`
 
 `async Function`
@@ -113,8 +145,12 @@ The only properties of the reported error you can modify are its `message` and i
 
 Specify this asynchronous function to configure which requests are included in usage reports sent to Apollo Studio. For example, you can omit requests that execute a particular operation or requests that include a particular HTTP header.
 
+ Note that returning false here means that the operation will be completely ignored by all Apollo Studio features. If you merely want to improve performance by skipping the field-level execution trace, set the [`fieldLevelInstrumentation`](#fieldlevelinstrumentation) option instead of this one.
+
 This function is called for each received request. It takes a [`GraphQLRequestContext`](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-server-types/src/index.ts#L115-L150) object and must return a `Promise<Boolean>` that indicates whether to include the request. It's called either after the operation is successfully resolved (via [the `didResolveOperation` event](https://www.apollographql.com/docs/apollo-server/integrations/plugins/#didresolveoperation)), or when sending the final error response if the operation was not successfully resolved (via [the `willSendResponse` event](https://www.apollographql.com/docs/apollo-server/integrations/plugins/#willsendresponse)).
 
+If you don't want any usage reporting at all, don't use this option: instead, either avoid specifying an Apollo API key or explicitly [disable the plugin](#disabling-the-plugin).
+
 By default, all requests are included in usage reports.
 
 </td>

diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -81,6 +81,7 @@
  "@types/koa-router": "7.4.4",
  "@types/lodash": "4.14.178",
  "@types/lodash.sortby": "4.7.6",
+ "@types/lodash.sumby": "4.6.6",
  "@types/lodash.xorby": "4.7.6",
  "@types/lru-cache": "5.1.1",
  "@types/memcached": "2.2.7",
@@ -112,10 +113,12 @@
  "jest": "27.4.5",
  "jest-config": "27.4.5",
  "jest-junit": "13.0.0",
+ "jest-mock-random": "1.1.1",
  "js-sha256": "0.9.0",
  "koa": "2.13.4",
  "koa-router": "10.1.1",
  "lerna": "4.0.0",
+ "lodash.sumby": "4.6.0",
  "log4js": "6.3.0",
  "memcached-mock": "0.1.0",
  "mock-req": "0.2.0",