release: v1.16.0 #3033

abernix · 2023-05-03T17:48:28Z

This particular PR should be true-merged to main.

This PR represents the merge to main of the v1.16.0 release and follows up the release prep PR.

This PR is mainly a merge commit, so reviewing every individual commit shown below is not necessary since those have been reviewed in their own PR.

However! Some things to review on this PR:

Does this PR target the right branch? (usually, main)
Are the appropriate version bumps and release note edits in the end of the commit list (or within the last few commits). In other words, did the release prep PR actually land on this branch?

If those things look good, this PR is good to merge.

Router execution results in a heavily fragmented heap, that is hard to manage for the default allocator. jemallocator handles it better, which results in performance improvements, in latency and memory usage

There's a cost in using the `Context` structure throughout a request's lifecycle, due to the JSON serialization and deserialization, so it should be reserved from inter plugin communication between rhai, coprocessor and Rust. But for internal router usage, we can have a more efficient structure that avoids serialization costs, and does not expose data that should not be modified by plugins. That structure is based on a map indexed by type id, which means that if some part of the code can see that type, then it can access it in the map. This means that some context data that was accessible from rhai and all plugins, is now inaccessible

Response formatting generates a lot of temporary allocations to create response paths that end up unused. By making a reference based type to hold these paths, we can prevent those allocations and improve performance.

The in memory cache requires synchronization, and precedently we used a futures aware mutex for that, but those are susceptible to contention. This replaces that mutex with a parking-lot synchronous mutex that is much faster. --------- Co-authored-by: Gary Pennington <gary@apollographql.com>

we do not show span attributes in our logs, but the log formatter still spends some time formatting them to a string, even when there will be no logs written for the trace. This adds the `NullFieldFormatter` that entirely avoids formatting the attributes

Follow-up to the v1.15.1 being officially released, bringing version bumps and changelog updates into the `dev` branch.

This is a small performance improvement. When the `Accept` is not present, the previous code would still traverse the header map three times.

When a batch span exporter is unable to send accept a span because the buffer is full it will emit an error. These errors can be very frequent and could potentially impact performance. Otel errors are not rate limited to one every ten seconds per error type. Fixes #2953  **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [ ] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test` --------- Co-authored-by: bryn <bryn@apollographql.com>

add support of operationCountByType for apollo studio --------- Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

The context requires synchronized access to the busy timer, and precedently we used a futures aware mutex for that, but those are susceptible to contention. This replaces that mutex with a parking-lot synchronous mutex that is much faster.

…ttp2 (#2976) Fix a bug where `experimental_enable_http2` wouldn't properly apply when a global configuration was set.

Currently, if a rhai script errors in rhai, the rhai plugin ignores the error and returns None in the stream of results. This has two unfortunate aspects: - the error is not propagated to the client - the stream is terminated (silently) The fix captures the error and propagates the response to the client. Fixes #2935 #2936  **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [x] Changes are compatible[^1] - [x] Documentation[^2] completed - [x] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests ~- [ ] Integration Tests~ - [x] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test`

Partially fixes https://rustsec.org/advisories/RUSTSEC-2021-0139 `clap` is still depending on [ansi_term](https://github.com/ogham/rust-ansi-term). cargo audit before the change: ``` Crate: ansi_term Version: 0.12.1 Warning: unmaintained Title: ansi_term is Unmaintained Date: 2021-08-18 ID: RUSTSEC-2021-0139 URL: https://rustsec.org/advisories/RUSTSEC-2021-0139 Dependency tree: ansi_term 0.12.1 ├── clap 2.34.0 │ ├── structopt 0.3.26 │ │ └── cargo-scaffold 0.8.7 │ │ └── apollo-router-scaffold 1.12.1 │ └── cargo-scaffold 0.8.7 └── apollo-router 1.12.1 ├── throw-error 0.1.0 ├── supergraph_sdl 0.1.0 ├── rhai-surrogate-cache-key 0.1.0 ├── rhai-subgraph-request-log 0.1.0 ├── rhai-logging 0.1.0 ├── rhai-error-response-mutate 0.1.0 ├── rhai-data-response-mutate 0.1.0 ├── propagate-status-code 0.1.0 ├── op-name-to-header 0.1.0 ├── jwt-claims 0.1.0 ├── hello-world 0.1.0 ├── forbid_anonymous_operations_rhai 0.1.0 ├── forbid-anonymous-operations 0.1.0 ├── external-subgraph 0.1.0 ├── cookies-to-headers 0.1.0 ├── context-data 0.1.0 ├── async-allow-client-id 0.1.0 ├── apollo-router-benchmarks 1.12.1 └── add-timestamp-header 0.1.0 ``` cargo audit after the change: ``` Crate: ansi_term Version: 0.12.1 Warning: unmaintained Title: ansi_term is Unmaintained Date: 2021-08-18 ID: RUSTSEC-2021-0139 URL: https://rustsec.org/advisories/RUSTSEC-2021-0139 Dependency tree: ansi_term 0.12.1 └── clap 2.34.0 ├── structopt 0.3.26 │ └── cargo-scaffold 0.8.7 │ └── apollo-router-scaffold 1.12.1 └── cargo-scaffold 0.8.7 ``` Co-authored-by: Coenen Benjamin <benjamin.coenen@hotmail.com>

This reverts commit c0218a3.

Fixes https://rustsec.org/advisories/RUSTSEC-2023-0034

so this will be measured by CI benchmarks and serve as baseline for upcoming changes

This integrates the encoders from https://github.com/Nemo157/async-compression so we can use them other a stream that flushes data regularly

this makes an assumption that async-commpression cannot make: every chunk in the body stream has to be compressed and flushed directly, we should not wait for more data to come and get better compression. This is due to the multipart protocol for defer: we know each chunk represent eitehr the primary or a deferred response, and should be sent as soon as possible

Reverts #2989

…eleases (#3004) This changes our script to be explicit and pass the `--prerelease` version to our increasingly automated release pipeline when the `-` character is detected in the "version" property of the release being published. Typically, GitHub marks "latest" versions "based on semver"[1], but GitHub still marks "pre-release versions" (as defined by Semver 2.0 [2]) as "latest" irregardless of that meaning to semver. This seems a bit unexpected, but it is the way it is, so we'll account for it by making sure we're explicit. [1]: https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#:~:text=the%20latest%20release%20label%20will%20automatically%20be%20assigned%20based%20on%20semantic%20versioning [2]: https://semver.org/#spec-item-9 Follows-up: #3003 (root-cause) --------- Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com>

improved error message for missing query string in GET request, to address issue #2941 Fixes #2941 Co-authored-by: Jesse Rosenberger <git@jro.cc>

We replace tower-http's `CompressionLayer` with a custom stream transformation. This is necessary because tower-http uses async-compression, which buffers data until the end of the stream to then write it, ensuring a better compression. This is incompatible with the multipart protocol for `@defer`, which requires chunks to be sent as soon as possible. So we need to compress them independently. This extracts parts of the codec module of async-compression, which so far is not public, and makes a streaming wrapper above it that flushes the compressed data on every response in the stream. This is expected to be temporary, as we have in flight PRs for async-compression: - Nullus157/async-compression#155 - Nullus157/async-compression#178 With Nullus157/async-compression#150 we might be able to at least remove the vendored code

…3006) Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

The Router can now artificially be made to hot-reload (as if the configuration or schema had changed) at a configured time interval. This can help reproduce issues like reload-related memory leaks. The new configuration section for chaos testing is marked as experimental.  **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [ ] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [ ] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test` --------- Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>

The Rhai logging output was "standardized" about a year ago. Since that time, significant efforts have been made within the rest of the router to standardize on a slightly different format. This PR brings rhai log output into line with the rest of the router. It also addresses the requirement to make the "message" output specifiable by the script author. Note: The changes are source code compatible, but the format of the output log line is changing. I don't think we make compatibility commitments on this, but it's worth noting. Fixes #2777  **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [ ] Changes are compatible[^1] - [x] Documentation[^2] completed - [x] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests - [x] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test`

…es (#2960) This expands on the existing mechanism that was originally introduced in #2242, which supports the notion of an "experimental" feature, and make it compatible with the notion of "preview" features. **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [x] Changes are compatible[^1] - [x] Documentation[^2] completed - [-] Performance impact assessed and acceptable - Tests added and passing[^3] - [-] Unit Tests - [x] Integration Tests - [x] Manual Tests --------- Co-authored-by: o0Ignition0o <jeremy.lempereur@gmail.com> Co-authored-by: Simon Sapin <simon@apollographql.com>

This (hopefully) helps clarify that this docset is specific to self-hosting the router as opposed to managing cloud routing via GraphOS.

@bnjjj

…ages to studio (#3011) When using subgraphs which are enabled with [Apollo Federated Tracing](https://www.apollographql.com/docs/router/configuration/apollo-telemetry/#enabling-field-level-instrumentation), the error messages within those traces will be **redacted by default**. New configuration (`tracing.apollo.errors.subgraph.all.redact`, which defaults to `true`) enables and disables the redaction mechanism. Similar configuration (`tracing.apollo.errors.subgraph.all.send`, which also defaults to `true`) enables and disables the entire transmission of the error to Studio. The error messages returned to the clients are **not** changed or redacted from their previous behavior. To enable sending subgraph's federated trace error messages to Studio **without redaction**, you can set the following configuration: ```yaml title="router.yaml" telemetry: apollo: errors: subgraph: all: send: true # (true = Send to Studio, false = Do not send; default: true) redact: false # (true = Redact full error message, false = Do not redact; default: true) ``` It is also possible to configure this **per-subgraph** using a `subgraphs` map at the same level as `all` in the configuration, much like other sections of the configuration which have subgraph-specific capabilities: ```yaml title="router.yaml" telemetry: apollo: errors: subgraph: all: send: true redact: false # Disable redaction as a default. The `accounts` service enables it below. subgraphs: accounts: # Applies to the `accounts` subgraph, overriding the `all` global setting. redact: true # Redact messages from the `accounts` service. ``` By [@bnjjj](https://github.com/bnjjj) in #3011 --------- Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com> Co-authored-by: Jesse Rosenberger <git@jro.cc> Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>

To bring in Federation v2.4.2 query planner: https://github.com/apollographql/federation/releases/tag/%40apollo%2Fquery-planner%402.4.2

garypen and others added 30 commits April 18, 2023 11:07

linux: replace default allocator with jemalloc (#2882)

dd8ca62

Router execution results in a heavily fragmented heap, that is hard to manage for the default allocator. jemallocator handles it better, which results in performance improvements, in latency and memory usage

lighter path manipulation in response formatting (#2854)

fbc4b4a

Response formatting generates a lot of temporary allocations to create response paths that end up unused. By making a reference based type to hold these paths, we can prevent those allocations and improve performance.

Reconcile dev after merge to main for v1.15.1 (#2972)

1e6bd0d

Follow-up to the v1.15.1 being officially released, bringing version bumps and changelog updates into the `dev` branch.

Parse Accept headers once instead of three times (#2847)

028f6d9

This is a small performance improvement. When the `Accept` is not present, the previous code would still traverse the header map three times.

feat: add support of operationCountByType for apollo studio (#2979)

6cc999b

add support of operationCountByType for apollo studio --------- Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

use a parking-lot mutex in Context (#2885)

e691bde

The context requires synchronized access to the busy timer, and precedently we used a futures aware mutex for that, but those are susceptible to contention. This replaces that mutex with a parking-lot synchronous mutex that is much faster.

Fix: Traffic shaping configuration fallback for experimental_enable_h…

e34906b

…ttp2 (#2976) Fix a bug where `experimental_enable_http2` wouldn't properly apply when a global configuration was set.

Revert "use a parking-lot mutex in cache storage (#2887)" (#2980)

737dc05

This reverts commit c0218a3.

update h2 (#2982)

d50aed1

Fixes https://rustsec.org/advisories/RUSTSEC-2023-0034

activate the compression layer for all requests

abae695

so this will be measured by CI benchmarks and serve as baseline for upcoming changes

Merge branch 'dev' into geal/fix-defer-compression

0c6b312

vendor encoders from async-compression

3cd2e3f

This integrates the encoders from https://github.com/Nemo157/async-compression so we can use them other a stream that flushes data regularly

add more algorithms

cce57c4

generate the compressor from the Accept-Encoding header

0d8cb2d

lint

0f6de99

fmt

e33a9cf

fix the Content-Encoding header

f7f6d78

fix tests

992fb74

Merge branch 'dev' into geal/merge-dev-in-staging-perf

177e837

merge dev in staging perf (#2989)

19a3a66

Revert "merge dev in staging perf"

bc63b32

Revert "merge dev in staging perf" (#2990)

1e433fd

Reverts #2989

fixes

0fb0ba9

Geal and others added 14 commits April 26, 2023 16:27

remove zlib

56b47fd

changeset

2e4cec7

lint

0fbaed1

improved error message to address issue #2941 (#2955)

1addfdc

improved error message for missing query string in GET request, to address issue #2941 Fixes #2941 Co-authored-by: Jesse Rosenberger <git@jro.cc>

fix(telemetry): limit the memory usage of apollo telemetry exporter (#…

dc252df

…3006) Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

docs: Rename router docset (#3021)

207c686

This (hopefully) helps clarify that this docset is specific to self-hosting the router as opposed to managing cloud routing via GraphOS.

fix(deps): update rust crate router-bridge to v0.2.2+v2.4.2 (#2910)

7897f81

To bring in Federation v2.4.2 query planner: https://github.com/apollographql/federation/releases/tag/%40apollo%2Fquery-planner%402.4.2

prep release: v1.16.0

60285d1

apollo-bot2 assigned abernix May 3, 2023

abernix marked this pull request as ready for review May 3, 2023 17:48

abernix requested a review from StephenBarlow as a code owner May 3, 2023 17:48

abernix added the release label May 3, 2023

abernix requested review from a team, Geal, SimonSapin, bnjjj, chandrikas, BrynCooke and garypen May 3, 2023 18:01

abernix enabled auto-merge May 3, 2023 18:11

Geal approved these changes May 3, 2023

View reviewed changes

abernix disabled auto-merge May 3, 2023 18:16

abernix merged commit 2a9e3be into main May 3, 2023

abernix deleted the 1.16.0 branch May 3, 2023 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v1.16.0 #3033

release: v1.16.0 #3033

abernix commented May 3, 2023

release: v1.16.0 #3033

release: v1.16.0 #3033

Conversation

abernix commented May 3, 2023