usage reporting plugin: add captureTraces option

The usage reporting plugin already has an includeRequest option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, includeRequest=false is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, captureTraces, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page You can also pass a number between 0 or 1, which will be interpreted as randomly capturing traces that percent of a time. (Specifying this as a number rather than a function that invokes Math.random means that a future version of Apollo Server and Studio could include this number in the usage report and use it to let you view an estimate for total field executions based on using the number as a scale; this is however not yet implemented.) Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with captureTraces always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has captureTraces:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.
apollographql · Dec 23, 2021 · e466e8d · e466e8d
1 parent 26030d1
commit e466e8d
Show file tree

Hide file tree

Showing 20 changed files with 919 additions and 245 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,7 +10,7 @@ The version headers in this history reflect the versions of Apollo Server itself
 ## vNEXT (minor!)
 
 - `apollo-server-core`: Usage reporting no longer sends a "client reference ID" to Apollo Studio (along with the client name and client version). This little-used feature has not been documented [since 2019](https://github.com/apollographql/apollo-server/pull/3180) and is currently entirely ignored by Apollo Studio. This is technically incompatible as the interface `ClientInfo` no longer has the field `clientReferenceId`; if you were one of the few users who explicitly set this field and you get a TypeScript compilation failure upon upgrading to v3.6.0, just stop using the field. [PR #5890](https://github.com/apollographql/apollo-server/pull/5890)
-- `apollo-server-core`: Preliminary support for referenced field reporting. THIS ENTRY NEEDS TO BE EXPANDED BEFORE THE v3.6.0 RELEASE. [Issue #5708](https://github.com/apollographql/apollo-server/issues/5708) [PR #5956](https://github.com/apollographql/apollo-server/pull/5956)
+- `apollo-server-core`: Preliminary support for referenced field reporting. THIS ENTRY NEEDS TO BE EXPANDED BEFORE THE v3.6.0 RELEASE. Include description of fieldLevelInstrumentation and the ability to avoid instrumentation. Include effect on `metrics` object. [Issue #5708](https://github.com/apollographql/apollo-server/issues/5708) [PR #5956](https://github.com/apollographql/apollo-server/pull/5956)
 - `apollo-server-core`: Remove dependency on `apollo-graphql` package (by inlining the code which generates usage reporting signatures). That package has not yet been published with a `graphql@16` peer dependency, so Apollo Server v3.5 did not fully support `graphql@16` without overriding peer dependencies. [Issue #5941](https://github.com/apollographql/apollo-server/issues/5941) [PR #5955](https://github.com/apollographql/apollo-server/pull/5955)
 
 ## v3.5.0

diff --git a/docs/source/api/plugin/usage-reporting.md b/docs/source/api/plugin/usage-reporting.md
@@ -105,6 +105,38 @@ The only properties of the reported error you can modify are its `message` and i
 <tr>
 <td>
 
+###### `fieldLevelInstrumentation`
+
+`async Function` or `number`
+</td>
+<td>
+
+This option allows you to choose if Apollo Server should calculate detailed per-field statistics for a particular request. It is only called for executable operations: operations which parse and validate properly and which do not have an unknown operation name. It is not called if an [`includeRequest`](#includerequest) hook is provided and returns false.
+
+You can either pass an async function or a number. The function receives a `GraphQLRequestContext`. (The effect of passing a number is described later.) Your function can return a boolean or a number; returning false is equivalent to returning 0 and returning true is equivalent to returning 1.
+
+Returning false (or 0) means that Apollo Server will only pay attention to overall properties of the operation, like what GraphQL operation is executing and how long the entire operation takes to execute, and not anything about field-by-field execution.
+
+If you return false (or 0), this operation *will* still contribute to most features of Studio, such as schema checks, the Operations page, and the "referencing operations" statistic on the Fields page, etc.
+
+If you return false (or 0), this operation will *not* contribute to the "field executions" statistic on the Fields page or to the execution timing hints optionally displayed in Studio Explorer or in vscode-graphql. Additionally, this operation will not produce a trace that can be viewed on the Traces section of the Operations page.
+
+Returning false (or 0) here for some or all operations can improve your server's performance, as the overhead of calculating complete traces is not always negligible. This is especially the case if this server is an Apollo Gateway, as captured traces are transmitted from the subgraph to the Gateway in-band inside the actual GraphQL response.
+
+Returning a positive number means that Apollo Server will track each field execution and send Apollo Studio statistics on how many times each field was executed and what the per-field performance was. If the number returned is less than 1, Apollo Server will also send a scaled "estimate" count for each field, equal to the number of observed field executions divided by the number returned by the hook.
+
+Passing a number `n` (which should be between 0 and 1 inclusive) for `fieldLevelInstrumentation` is equivalent to passing the function `async () => Math.random() < n ? n : 0`. For example, if you pass 0.01, then 99% of the time Apollo Server will not track field executions, and 1% of the time Apollo Server will track field executions and send them to Apollo Studio both as an exact observed count and as an "estimated" count which is 100 times higher.
+
+(Note that returning true here does *not* mean that the data derived from field-level instrumentation must be transmitted to Apollo Studio's servers in the form of a trace; it may still be aggregated locally to statistics. But either way this operation will contribute to the "field executions" statistic and timing hints.)
+
+The default `fieldLevelInstrumentation` is a function that always returns true.
+
+</td>
+</tr>
+
+<tr>
+<td>
+
 ###### `includeRequest`
 
 `async Function`
@@ -113,8 +145,12 @@ The only properties of the reported error you can modify are its `message` and i
 
 Specify this asynchronous function to configure which requests are included in usage reports sent to Apollo Studio. For example, you can omit requests that execute a particular operation or requests that include a particular HTTP header.
 
+ Note that returning false here means that the operation will be completely ignored by all Apollo Studio features. If you merely want to improve performance by skipping the field-level execution trace, set the [`fieldLevelInstrumentation`](#fieldlevelinstrumentation) option instead of this one.
+
 This function is called for each received request. It takes a [`GraphQLRequestContext`](https://github.com/apollographql/apollo-server/blob/main/packages/apollo-server-types/src/index.ts#L115-L150) object and must return a `Promise<Boolean>` that indicates whether to include the request. It's called either after the operation is successfully resolved (via [the `didResolveOperation` event](https://www.apollographql.com/docs/apollo-server/integrations/plugins/#didresolveoperation)), or when sending the final error response if the operation was not successfully resolved (via [the `willSendResponse` event](https://www.apollographql.com/docs/apollo-server/integrations/plugins/#willsendresponse)).
 
+If you don't want any usage reporting at all, don't use this option: instead, either avoid specifying an Apollo API key or explicitly [disable the plugin](#disabling-the-plugin).
+
 By default, all requests are included in usage reports.
 
 </td>

diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -81,6 +81,7 @@
  "@types/koa-router": "7.4.4",
  "@types/lodash": "4.14.178",
  "@types/lodash.sortby": "4.7.6",
+ "@types/lodash.sumby": "4.6.6",
  "@types/lodash.xorby": "4.7.6",
  "@types/lru-cache": "5.1.1",
  "@types/memcached": "2.2.7",
@@ -112,10 +113,12 @@
  "jest": "27.4.5",
  "jest-config": "27.4.5",
  "jest-junit": "13.0.0",
+ "jest-mock-random": "1.1.1",
  "js-sha256": "0.9.0",
  "koa": "2.13.4",
  "koa-router": "10.1.1",
  "lerna": "4.0.0",
+ "lodash.sumby": "4.6.0",
  "log4js": "6.3.0",
  "memcached-mock": "0.1.0",
  "mock-req": "0.2.0",