-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Retries for Transient Errors Connecting to Graphs/Subgraphs #1936
Conversation
self.message_sender.update_subgraph(&subgraph_definition)?; | ||
} | ||
} | ||
None => { | ||
self.message_sender.add_subgraph(&subgraph_definition)?; | ||
} | ||
} | ||
self.subgraph_retry_countdown = self.subgraph_retries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we document the logic on how retry_countdown
works?
For my understanding:
- It is normally at
0
- On success, we re-initialise it to
subgraph_retries
- On every attempt, we decrement until it reaches
0
This means that if the first attempt is a failure, we stay at 0
, thus go into the branch at line 225
src/options/subgraph.rs
Outdated
#[arg(long = "subgraph-retry", short = 'r', default_value = "1")] | ||
#[serde(skip_serializing)] | ||
pub subgraph_retry: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should this be pluralised? There is a bit of naming inconsistency in this PR between retry
and retries
#[arg(long = "subgraph-retry", short = 'r', default_value = "1")] | |
#[serde(skip_serializing)] | |
pub subgraph_retry: u64, | |
#[arg(long = "subgraph-retries", short = 'r', default_value = "1")] | |
#[serde(skip_serializing)] | |
pub subgraph_retries: u64, |
src/utils/client.rs
Outdated
@@ -64,6 +64,7 @@ impl ClientBuilder { | |||
.danger_accept_invalid_hostnames(self.accept_invalid_hostnames) | |||
.timeout(self.timeout) | |||
.user_agent(format!("{}/{}", PKG_NAME, PKG_VERSION)) | |||
.pool_max_idle_per_host(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: should we keep this change in this PR?
There might be unknown side-effects with this change, so I'd prefer to remove it, but re-add it at a later time if we see this is needed.
0f9b427
to
e740f7f
Compare
// | ||
if self.subgraph_retry_countdown > 0 { | ||
self.subgraph_retry_countdown -= 1; | ||
eprintln!("{} error detected communicating with subgraph '{}', schema changes will not be reflected.\nWill retry but subgraph logs should be inspected", Emoji::Warn, &self.subgraph_key.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to confirm, schema changes won't be saved because introspection couldn't run or because rover couldn't confirm the schema with the subgraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So more likely the former, essentially this happens when the introspection query fails for whatever reason (4xx, 5xx and so on), so we don't get back a new schema to check against.
src/command/dev/watcher.rs
Outdated
Some(e.to_string()) | ||
} else { | ||
eprintln!( | ||
"{} retries exhausted for subgraph {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. how many retries does the user get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's configurable with the --subgraph-retries
parameter but I can easily add that into the log message so people know it's being respected?
d7306c4
to
38bf054
Compare
5673f1a
to
3bd20d1
Compare
This way we avoid subgraphs potentially being removed for spurious reasons. In addition enable it for 'graphs` as well so that we don't end up only partially fixing this
This should not be so eager to do so, and is now in fact configurable.
3bd20d1
to
75003aa
Compare
# [0.25.0] - 2024-07-22 ## 🚀 Features - **Enable Retries For Transient Errors Connecting To Graphs/Subgraphs - @jonathanrainer PR #1936** This turns on retries at the HTTP level for connections to graphs/subgraphs to minimize connection resets and cancellations. Also, a new --subgraph-retries flag for rover dev lets you set the number of retries allowed when trying to re-establish a connection. - **Add `--graph-ref` flag to `rover dev` - @dotdat PR #1984** Introduces subgraph mirroring to rover dev. Subgraph mirroring inherits the subgraph routing URLs and schemas from an existing Studio graphref. This makes it easy to spin up a locally running supergraph without maintaining a supergraph config. [See here](https://www.apollographql.com/docs/rover/commands/dev#starting-a-session-from-a-graphos-studio-variant) for more information. ## 🐛 Fixes - **Fixes issues related to passing filenames to `--output` - @jonathanrainer PR #1996** An issue was raised whereby previous versions of Rover supported passing filenames to the `--output` flag but this was broken in v0.24.0. This has now been fixed and the previous functionality restored. ## 🛠 Maintenance - **Expand Smoke Tests To Run On All Supported Platforms - @jonathanrainer PR #1980** - **Fix cron expression, so it runs only once per day - @jonathanrainer PR #1986** - **Ensure we always use the correct version of Federation when testing - @jonathanrainer PR #1987** - **Add manual Smoke test invocation and pin Windows to `npm@9` for testing - @jonathanrainer PR #1989** - **Update apollographql/router to v1.51.0 - @jonathanrainer PR #1988** - **Update node.js packages - @jonathanrainer PR #1979** Includes `@eslint/compat` to v1.1.1, `eslint` to v9.7.0, `node.js` to v20.15.1, `npm` to v10.8.2 and `prettier` to v3.3.3 - **Make sure x86 Mac Tests use 'latest' supergraph plugin version - @jonathanrainer PR #1990** - **Make sure homebrew runs `brew update` when we use it - @jonathanrainer PR #1993** ## 📚 Documentation - **Adds `graph-ref` flag to dev subcommand docs - @jackonawalk PR #1945** - **Update schema proposals capabilities docs - @Meschreiber PR #1949**
This PR fixes two separate issues identified with the way Rover handles retries for communicating with graphs/subgraphs when using
rover dev
.subgraph-retries
option.