rework tenant attach code to share the initialization code path with tenant load #3466

problame · 2023-01-27T11:25:10Z

At the time of writing, the only logical difference between Attach and Load is that Attach learns the list of timelines by querying the remote storage, whereas Load learns it by listing the timelines directory of the tenant.

This patch restructures the code such that Attach

prepares the on-disk state and then
calls into the same load() routine that is used by Load.

Further, this patch provides the following fixes & improvements to Attach:

Make Attach durable before acknowledging it to the management API client.

Before this change, we would acknowledge after just creating the in-memory tenant.
In the event of a crash before creating the tenant directory & fsyncing the attaching marker,
the pageserver would come up without the tenant present (404), even though we acknowledged
success to the client.
Simplified resume logic if we crash during Attach.
Before this patch, if we crashed during Attach with some timelines downloaded and others not downloaded,
we would combine existing metadata files with remote ones one-by-one to figure out what's missing.
That was necessary before on-demand download because we were downloading layer files as part of Attach.
However, with on-demand download, Attach only downloads & writes the timeline metadata files.

After this patch, when we crash during Attach, we blow away the tenant's directory while leaving its attach marker file in place. Then, we start over.

IMO this is significantly easier to reason about compared to what we had before.
Note that we were losing the work for the downloads even before this change, so that's not a regression (the old reconcile_with_remote would still need to download the IndexParts when resuming Attach after a crash).

If we want to improve on this in the future, I think the first order of business will be to
avoiding re-downloading the IndexPart's and the initial list_remote_timelines().
However, given that crashes should be rare, and attach events also, I don't think the number
one priority with Attach code should be to make it as simple as possible.

For (2), I changed the location of the attach marker file to be outside the tenant directory,
so that we can use standard functions for removing the tenant directory.
I even wrote a migration function for it, although in retrospect, I think it's quite unlikely that there are any tenants in attaching state deployed.
But oh well, now the code is there, and it even has unit tests.
We can delete the migration code once we've successfully rolled it out to all regions.

The remaining wrinkle with this change is that Attach needs to hint the downloaded IndexParts to load() so that it doesn't download them twice during Attach, which would be wasteful.
The mechanism for this is the new TenantLoadReason and TimelineLoadReason.

We could eliminate this particular case by on-demand downloading the metadata.
However, that might open up another can of worms which I'd like to avoid.
If we ever want to go that route, I suggest we start tracking the attachment state of a timeline more formally, e.g., in a timelines.json file.

TODOs:

reviews
squash & update commit message

problame · 2023-01-27T19:33:18Z

Rebased to get fixes for GitHub workflows from main.

koivunej

All in all good changes, smaller nits, rename ideas from me so far. I probably missed some important points so will probably come back to this. Would definitetly like to see the added test cases split at least.

pageserver/src/tenant.rs

pageserver/src/config.rs

pageserver/src/tenant.rs

koivunej · 2023-01-30T08:41:20Z

(Re-ran failed steps in an effort to help the work with runners.)

pageserver/src/tenant.rs

SomeoneToIgnore · 2023-01-31T08:57:32Z

pageserver/src/tenant/mgr.rs

-
-        let tenant = Tenant::spawn_attach(conf, tenant_id, remote_storage, ctx);
+        let tenant = Tenant::spawn_start_attach(conf, tenant_id, remote_storage, ctx)
+            .map_err(|source| AttachError { tenant_id, source })?;


AttachError from above is used only here, and that gets ?-ed into TenantMapInsertError, via more explicit route than anyhow::Error before.

Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#}) returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.
We can add another variant into TenantMapInsertError.

AttachError from above is used only here, and that gets ?-ed into TenantMapInsertError, via more explicit route than anyhow::Error before.
Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#}) returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.

Yeah, right, there is no distinguished treatment.
Will change that.

We can add another variant into TenantMapInsertError.

Nah, if we ever need distinguished treatment of attach errors in the caller, I'd rather make it's Closure variant generic, i.e., enum TenantMapInsertError<OE> { ..., Other(OE) }.

SomeoneToIgnore · 2023-01-31T09:54:06Z

pageserver/src/tenant.rs

+pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching";
+pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching";
+
+/// The error message does not include the tenant ID.


Does it make sense?
In every function this AttachError is constructed, there's a tenant_id: TenantId argument.
I'd add this field in every enum variant instead.

IMO a function should not repeat its arguments in the errors it emits.
Rationale:

avoids unnecessary copies if the caller has distinguished error handling (i.e., a match, not simply ?)

DRY: with your proposal, each error variant needs to be enrichted with tenant_id. With my approach, the caller is responsible for enrichting the error, in exactly one place.

You could've wrapped the enum with some struct or enforced the ID presence some other way, but up to you.

If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me

…tenant load At the time of writing, the only logical difference between Attach and Load is that Attach learns the list of timelines by querying the remote storage, whereas Load learns it by listing the timelines directory of the tenant. This patch restructures the code such that Attach 1. prepares the on-disk state and then 2. calls into the same `load()` routine that is used by Load. Further, this patch provides the following fixes & improvements to Attach: 1. Make Attach durable before acknowledging it to the management API client. Before this change, we would acknowledge after just creating the in-memory tenant. In the event of a crash before creating the tenant directory and fsyncing the attaching marker, the pageserver would come up without the tenant present (404), even though we acknowledged success to the client. 2. Simplified resume logic if we crash during Attach. Before this patch, if we crashed during Attach with some timelines downloaded and others not downloaded, we would combine existing metadata files with remote ones one-by-one to figure out what's missing. That was necessary before on-demand download because we were downloading layer files as part of Attach. However, with on-demand download, Attach only downloads & writes the timeline metadata files. After this patch, when we crash during Attach, we blow away the tenant's directory while leaving its attach marker file in place. Then, we start over. IMO this is significantly easier to reason about compared to what we had before. Note that we were losing the work for the downloads even before this change, so that's not a regression (the old reconcile_with_remote would still need to download the `IndexPart`s when resuming Attach after a crash). If we want to improve on this in the future, I think the first order of business will be to avoiding re-downloading the `IndexPart`'s and the initial `list_remote_timelines()`. However, given that crashes should be rare, and attach events also, I don't think the number one priority with Attach code should be to make it as simple as possible. For (2), I changed the location of the attach marker file to be outside the tenant directory, so that we can use standard functions for removing the tenant directory. I even wrote a migration function for it, although in retrospect, I think it's quite unlikely that there are any tenants in attaching state deployed. But oh well, now the code is there, and it even has unit tests. We can delete the migration code once we've successfully rolled it out to all regions. The remaining wrinkle with this change is that Attach needs to hint the downloaded `IndexPart`s to `load()` so that it doesn't download them twice during Attach, which would be wasteful. The mechanism for this is the new `TenantLoadReason` and `TimelineLoadReason`. We could eliminate this particular case by on-demand downloading the metadata. However, that might open up another can of worms which I'd like to avoid. If we ever want to go that route, I suggest we start tracking the attachment state of a timeline more formally, e.g., in a `timelines.json` file. This is PR #3466

problame · 2023-01-31T17:41:24Z

Addressed review comment rework tenant attach code to share the initialization code path with tenant load #3466 (comment)
Squashed commits & updated commit message to write-up in the PR description above.

LizardWizzard

Thanks for cleaning this up! I like that it became simpler. Left some comments, feel free to just resolve NIT ones. Found nothing serious. If you'd ask I'm +1 for not handling attach marker migration at all. We just need to be sure no one is running attaches during release

pageserver/src/tenant.rs

LizardWizzard · 2023-02-01T14:06:04Z

pageserver/src/tenant.rs

+pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching";
+pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching";
+
+/// The error message does not include the tenant ID.


If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me

pageserver/src/tenant.rs

LizardWizzard · 2023-02-01T15:53:34Z

pageserver/src/tenant.rs

-            crashsafe::fsync_file_and_parent(&marker_file)
-                .context("fsync tenant attaching marker file and parent")?;
+            info!("removing prior attach operation's progress");
+            std::fs::remove_dir_all(&tenant_dir).context("remove attaching tenant dir")?;


One possible problem with this approach in the future. I expect that control plane will pass tenant config to attach call (I think thats reasonable way to pass it because we dont store it on s3). Previously we could've saved the config as usual, but now we will delete it during resume phase. We can write the config to attach marker. Or do it in some other way. WDYT?

. We can write the config to attach marker. Or do it in some other way. WDYT?

Yeah, that's how I'd solve it in the future. Should we prepare for this by writing

{ "format_version": ` }

into the new attach markers?

For now its simple, the file is empty, so we can check for it and think about format later

The future version won't be able to distinguish a partially written file that it wrote before a crash from a file written by a version with this patch, then.

I'll add this now, this PR is in no big rush anyways, let's get it right now.

How does #3519 look like to you?

Reviewed #3519

problame · 2023-02-01T16:43:37Z

Thanks for the review, will address the nits tomorrow.

If you'd ask I'm +1 for not handling attach marker migration at all. We just need to be sure no one is running attaches during release

You think the risk of a bug in the migration code outweighs the risk that there might be some attach markers somewhere?

Idea:

remove the migration code
make this PR modify the deploy Ansible script to check that there's no old attach markers present before deploying the new binary.
after the deployment, remove that check from the deploy Ansible script

LizardWizzard · 2023-02-01T19:16:03Z

You think the risk of a bug in the migration code outweighs the risk that there might be some attach markers somewhere?

Realistically I think there wont be any attach markers, and its easy to check for that before the release. So it wont trigger any possible bugs in migration code.

So IMO this PR illustrates the need for automated changelog entry generation. I e there should be a possibility to leave a note that will be picked up in release PR so person who will deploy a release can check for additional safety requirements. cc @vadim2404 @shanyp

In this case I think its ok to go with simplest approach. Migration code looks ok to me, so may not worth it to spend time hacking something in ansible

shanyp · 2023-02-02T08:36:15Z

yes, actually we have the /release-notes label for it

LizardWizzard · 2023-03-21T16:09:21Z

Would be cool to revive this patch

LizardWizzard · 2023-03-21T16:09:42Z

As discussed moving to draft (for now)

problame · 2024-06-19T13:26:28Z

As of many months ago, we're always attaching on startup, there's no more load path.
So, this idea has been implemented, just by a different PR.

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from d8288a1 to 7936be3 Compare January 27, 2023 11:55

This comment was marked as outdated.

Sign in to view

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 7936be3 to 3c75c0d Compare January 27, 2023 14:58

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 3c75c0d to d206174 Compare January 27, 2023 15:57

problame changed the title ~~WIP rework Tenant attach to share initialization code path with load~~ rework tenant attach code to share the initialization code path with tenant load Jan 27, 2023

problame requested review from SomeoneToIgnore and LizardWizzard January 27, 2023 19:30

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from fe78e74 to e275772 Compare January 27, 2023 19:32

problame marked this pull request as ready for review January 27, 2023 19:33

problame requested review from a team as code owners January 27, 2023 19:33

problame requested review from bojanserafimov and removed request for a team January 27, 2023 19:33

koivunej reviewed Jan 30, 2023

View reviewed changes

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from b63d8f6 to c92ebd3 Compare January 30, 2023 14:10

koivunej reviewed Jan 30, 2023

View reviewed changes

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

koivunej reviewed Jan 30, 2023

View reviewed changes

pageserver/src/tenant.rs Outdated Show resolved Hide resolved

problame mentioned this pull request Jan 30, 2023

better error message when failing to parse tenants / timelines dir entry [P:3] [S:3] #3488

Open

SomeoneToIgnore approved these changes Jan 31, 2023

View reviewed changes

problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 764b199 to 4af7a82 Compare January 31, 2023 17:40

LizardWizzard reviewed Feb 1, 2023

View reviewed changes

problame mentioned this pull request Feb 1, 2023

trim down merge_local_remote_metadata() #3517

Closed

problame added 2 commits February 1, 2023 19:02

address NITs for the migration code

5e270ec

trim down merge_local_remote_metadata()

11cc103

problame mentioned this pull request Feb 1, 2023

well-defined marker file contents to support storing TenantConfOpt in the future #3519

Closed

1 task

LizardWizzard marked this pull request as draft March 21, 2023 16:09

problame mentioned this pull request Aug 23, 2023

refactor: single phase Timeline::load_layer_map #5074

Merged

problame closed this Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework tenant attach code to share the initialization code path with tenant load #3466

rework tenant attach code to share the initialization code path with tenant load #3466

problame commented Jan 27, 2023 •

edited

Loading

This comment was marked as outdated.

problame commented Jan 27, 2023

koivunej left a comment

koivunej commented Jan 30, 2023

SomeoneToIgnore Jan 31, 2023

problame Jan 31, 2023

SomeoneToIgnore Jan 31, 2023

problame Jan 31, 2023

SomeoneToIgnore Jan 31, 2023

LizardWizzard Feb 1, 2023

problame commented Jan 31, 2023

LizardWizzard left a comment

LizardWizzard Feb 1, 2023

LizardWizzard Feb 1, 2023

problame Feb 1, 2023

LizardWizzard Feb 1, 2023

problame Feb 1, 2023

problame Feb 1, 2023

LizardWizzard Feb 2, 2023

problame commented Feb 1, 2023

LizardWizzard commented Feb 1, 2023

shanyp commented Feb 2, 2023

LizardWizzard commented Mar 21, 2023

LizardWizzard commented Mar 21, 2023

problame commented Jun 19, 2024 •

edited

Loading

rework tenant attach code to share the initialization code path with tenant load #3466

rework tenant attach code to share the initialization code path with tenant load #3466

Conversation

problame commented Jan 27, 2023 • edited Loading

This comment was marked as outdated.

problame commented Jan 27, 2023

koivunej left a comment

Choose a reason for hiding this comment

koivunej commented Jan 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

problame commented Jan 31, 2023

LizardWizzard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

problame commented Feb 1, 2023

LizardWizzard commented Feb 1, 2023

shanyp commented Feb 2, 2023

LizardWizzard commented Mar 21, 2023

LizardWizzard commented Mar 21, 2023

problame commented Jun 19, 2024 • edited Loading

problame commented Jan 27, 2023 •

edited

Loading

problame commented Jun 19, 2024 •

edited

Loading