-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework tenant attach code to share the initialization code path with tenant load #3466
Conversation
d8288a1
to
7936be3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7936be3
to
3c75c0d
Compare
3c75c0d
to
d206174
Compare
fe78e74
to
e275772
Compare
Rebased to get fixes for GitHub workflows from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all good changes, smaller nits, rename ideas from me so far. I probably missed some important points so will probably come back to this. Would definitetly like to see the added test cases split at least.
(Re-ran failed steps in an effort to help the work with runners.) |
b63d8f6
to
c92ebd3
Compare
pageserver/src/tenant/mgr.rs
Outdated
|
||
let tenant = Tenant::spawn_attach(conf, tenant_id, remote_storage, ctx); | ||
let tenant = Tenant::spawn_start_attach(conf, tenant_id, remote_storage, ctx) | ||
.map_err(|source| AttachError { tenant_id, source })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttachError
from above is used only here, and that gets ?
-ed into TenantMapInsertError
, via more explicit route than anyhow::Error
before.
Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#})
returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.
We can add another variant into TenantMapInsertError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttachError from above is used only here, and that gets ?-ed into TenantMapInsertError, via more explicit route than anyhow::Error before.
Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#}) returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.
Yeah, right, there is no distinguished treatment.
Will change that.
We can add another variant into TenantMapInsertError.
Nah, if we ever need distinguished treatment of attach errors in the caller, I'd rather make it's Closure
variant generic, i.e., enum TenantMapInsertError<OE> { ..., Other(OE) }
.
pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching"; | ||
pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching"; | ||
|
||
/// The error message does not include the tenant ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense?
In every function this AttachError
is constructed, there's a tenant_id: TenantId
argument.
I'd add this field in every enum variant instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO a function should not repeat its arguments in the errors it emits.
Rationale:
- avoids unnecessary copies if the caller has distinguished error handling (i.e., a
match
, not simply?
) - DRY: with your proposal, each error variant needs to be enrichted with
tenant_id
. With my approach, the caller is responsible for enrichting the error, in exactly one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could've wrapped the enum with some struct or enforced the ID presence some other way, but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me
…tenant load At the time of writing, the only logical difference between Attach and Load is that Attach learns the list of timelines by querying the remote storage, whereas Load learns it by listing the timelines directory of the tenant. This patch restructures the code such that Attach 1. prepares the on-disk state and then 2. calls into the same `load()` routine that is used by Load. Further, this patch provides the following fixes & improvements to Attach: 1. Make Attach durable before acknowledging it to the management API client. Before this change, we would acknowledge after just creating the in-memory tenant. In the event of a crash before creating the tenant directory and fsyncing the attaching marker, the pageserver would come up without the tenant present (404), even though we acknowledged success to the client. 2. Simplified resume logic if we crash during Attach. Before this patch, if we crashed during Attach with some timelines downloaded and others not downloaded, we would combine existing metadata files with remote ones one-by-one to figure out what's missing. That was necessary before on-demand download because we were downloading layer files as part of Attach. However, with on-demand download, Attach only downloads & writes the timeline metadata files. After this patch, when we crash during Attach, we blow away the tenant's directory while leaving its attach marker file in place. Then, we start over. IMO this is significantly easier to reason about compared to what we had before. Note that we were losing the work for the downloads even before this change, so that's not a regression (the old reconcile_with_remote would still need to download the `IndexPart`s when resuming Attach after a crash). If we want to improve on this in the future, I think the first order of business will be to avoiding re-downloading the `IndexPart`'s and the initial `list_remote_timelines()`. However, given that crashes should be rare, and attach events also, I don't think the number one priority with Attach code should be to make it as simple as possible. For (2), I changed the location of the attach marker file to be outside the tenant directory, so that we can use standard functions for removing the tenant directory. I even wrote a migration function for it, although in retrospect, I think it's quite unlikely that there are any tenants in attaching state deployed. But oh well, now the code is there, and it even has unit tests. We can delete the migration code once we've successfully rolled it out to all regions. The remaining wrinkle with this change is that Attach needs to hint the downloaded `IndexPart`s to `load()` so that it doesn't download them twice during Attach, which would be wasteful. The mechanism for this is the new `TenantLoadReason` and `TimelineLoadReason`. We could eliminate this particular case by on-demand downloading the metadata. However, that might open up another can of worms which I'd like to avoid. If we ever want to go that route, I suggest we start tracking the attachment state of a timeline more formally, e.g., in a `timelines.json` file. This is PR #3466
764b199
to
4af7a82
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this up! I like that it became simpler. Left some comments, feel free to just resolve NIT ones. Found nothing serious. If you'd ask I'm +1 for not handling attach marker migration at all. We just need to be sure no one is running attaches during release
pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching"; | ||
pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching"; | ||
|
||
/// The error message does not include the tenant ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me
crashsafe::fsync_file_and_parent(&marker_file) | ||
.context("fsync tenant attaching marker file and parent")?; | ||
info!("removing prior attach operation's progress"); | ||
std::fs::remove_dir_all(&tenant_dir).context("remove attaching tenant dir")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible problem with this approach in the future. I expect that control plane will pass tenant config to attach call (I think thats reasonable way to pass it because we dont store it on s3). Previously we could've saved the config as usual, but now we will delete it during resume phase. We can write the config to attach marker. Or do it in some other way. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. We can write the config to attach marker. Or do it in some other way. WDYT?
Yeah, that's how I'd solve it in the future. Should we prepare for this by writing
{ "format_version": ` }
into the new attach markers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now its simple, the file is empty, so we can check for it and think about format later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future version won't be able to distinguish a partially written file that it wrote before a crash from a file written by a version with this patch, then.
I'll add this now, this PR is in no big rush anyways, let's get it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does #3519 look like to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed #3519
Thanks for the review, will address the nits tomorrow.
You think the risk of a bug in the migration code outweighs the risk that there might be some attach markers somewhere? Idea:
|
Realistically I think there wont be any attach markers, and its easy to check for that before the release. So it wont trigger any possible bugs in migration code. So IMO this PR illustrates the need for automated changelog entry generation. I e there should be a possibility to leave a note that will be picked up in release PR so person who will deploy a release can check for additional safety requirements. cc @vadim2404 @shanyp In this case I think its ok to go with simplest approach. Migration code looks ok to me, so may not worth it to spend time hacking something in ansible |
yes, actually we have the /release-notes label for it |
Would be cool to revive this patch |
As discussed moving to draft (for now) |
As of many months ago, we're always attaching on startup, there's no more load path. |
At the time of writing, the only logical difference between Attach and Load is that Attach learns the list of timelines by querying the remote storage, whereas Load learns it by listing the timelines directory of the tenant.
This patch restructures the code such that Attach
load()
routine that is used by Load.Further, this patch provides the following fixes & improvements to Attach:
Make Attach durable before acknowledging it to the management API client.
Before this change, we would acknowledge after just creating the in-memory tenant.
In the event of a crash before creating the tenant directory & fsyncing the attaching marker,
the pageserver would come up without the tenant present (404), even though we acknowledged
success to the client.
Simplified resume logic if we crash during Attach.
Before this patch, if we crashed during Attach with some timelines downloaded and others not downloaded,
we would combine existing metadata files with remote ones one-by-one to figure out what's missing.
That was necessary before on-demand download because we were downloading layer files as part of Attach.
However, with on-demand download, Attach only downloads & writes the timeline metadata files.
After this patch, when we crash during Attach, we blow away the tenant's directory while leaving its attach marker file in place. Then, we start over.
IMO this is significantly easier to reason about compared to what we had before.
Note that we were losing the work for the downloads even before this change, so that's not a regression (the old reconcile_with_remote would still need to download the
IndexPart
s when resuming Attach after a crash).If we want to improve on this in the future, I think the first order of business will be to
avoiding re-downloading the
IndexPart
's and the initiallist_remote_timelines()
.However, given that crashes should be rare, and attach events also, I don't think the number
one priority with Attach code should be to make it as simple as possible.
For (2), I changed the location of the attach marker file to be outside the tenant directory,
so that we can use standard functions for removing the tenant directory.
I even wrote a migration function for it, although in retrospect, I think it's quite unlikely that there are any tenants in attaching state deployed.
But oh well, now the code is there, and it even has unit tests.
We can delete the migration code once we've successfully rolled it out to all regions.
The remaining wrinkle with this change is that Attach needs to hint the downloaded
IndexPart
s toload()
so that it doesn't download them twice during Attach, which would be wasteful.The mechanism for this is the new
TenantLoadReason
andTimelineLoadReason
.We could eliminate this particular case by on-demand downloading the metadata.
However, that might open up another can of worms which I'd like to avoid.
If we ever want to go that route, I suggest we start tracking the attachment state of a timeline more formally, e.g., in a
timelines.json
file.TODOs: