Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework tenant attach code to share the initialization code path with tenant load #3466

Closed

Conversation

problame
Copy link
Contributor

@problame problame commented Jan 27, 2023

At the time of writing, the only logical difference between Attach and Load is that Attach learns the list of timelines by querying the remote storage, whereas Load learns it by listing the timelines directory of the tenant.

This patch restructures the code such that Attach

  1. prepares the on-disk state and then
  2. calls into the same load() routine that is used by Load.

Further, this patch provides the following fixes & improvements to Attach:

  1. Make Attach durable before acknowledging it to the management API client.

    Before this change, we would acknowledge after just creating the in-memory tenant.
    In the event of a crash before creating the tenant directory & fsyncing the attaching marker,
    the pageserver would come up without the tenant present (404), even though we acknowledged
    success to the client.

  2. Simplified resume logic if we crash during Attach.
    Before this patch, if we crashed during Attach with some timelines downloaded and others not downloaded,
    we would combine existing metadata files with remote ones one-by-one to figure out what's missing.
    That was necessary before on-demand download because we were downloading layer files as part of Attach.
    However, with on-demand download, Attach only downloads & writes the timeline metadata files.

    After this patch, when we crash during Attach, we blow away the tenant's directory while leaving its attach marker file in place. Then, we start over.

    IMO this is significantly easier to reason about compared to what we had before.
    Note that we were losing the work for the downloads even before this change, so that's not a regression (the old reconcile_with_remote would still need to download the IndexParts when resuming Attach after a crash).

    If we want to improve on this in the future, I think the first order of business will be to
    avoiding re-downloading the IndexPart's and the initial list_remote_timelines().
    However, given that crashes should be rare, and attach events also, I don't think the number
    one priority with Attach code should be to make it as simple as possible.

For (2), I changed the location of the attach marker file to be outside the tenant directory,
so that we can use standard functions for removing the tenant directory.
I even wrote a migration function for it, although in retrospect, I think it's quite unlikely that there are any tenants in attaching state deployed.
But oh well, now the code is there, and it even has unit tests.
We can delete the migration code once we've successfully rolled it out to all regions.

The remaining wrinkle with this change is that Attach needs to hint the downloaded IndexParts to load() so that it doesn't download them twice during Attach, which would be wasteful.
The mechanism for this is the new TenantLoadReason and TimelineLoadReason.

We could eliminate this particular case by on-demand downloading the metadata.
However, that might open up another can of worms which I'd like to avoid.
If we ever want to go that route, I suggest we start tracking the attachment state of a timeline more formally, e.g., in a timelines.json file.


TODOs:

  • reviews
  • squash & update commit message

@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from d8288a1 to 7936be3 Compare January 27, 2023 11:55
@koivunej

This comment was marked as outdated.

@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 7936be3 to 3c75c0d Compare January 27, 2023 14:58
@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 3c75c0d to d206174 Compare January 27, 2023 15:57
@problame problame changed the title WIP rework Tenant attach to share initialization code path with load rework tenant attach code to share the initialization code path with tenant load Jan 27, 2023
@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from fe78e74 to e275772 Compare January 27, 2023 19:32
@problame
Copy link
Contributor Author

Rebased to get fixes for GitHub workflows from main.

@problame problame marked this pull request as ready for review January 27, 2023 19:33
@problame problame requested review from a team as code owners January 27, 2023 19:33
@problame problame requested review from bojanserafimov and removed request for a team January 27, 2023 19:33
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all good changes, smaller nits, rename ideas from me so far. I probably missed some important points so will probably come back to this. Would definitetly like to see the added test cases split at least.

pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/config.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
@koivunej
Copy link
Member

(Re-ran failed steps in an effort to help the work with runners.)

@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from b63d8f6 to c92ebd3 Compare January 30, 2023 14:10
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved

let tenant = Tenant::spawn_attach(conf, tenant_id, remote_storage, ctx);
let tenant = Tenant::spawn_start_attach(conf, tenant_id, remote_storage, ctx)
.map_err(|source| AttachError { tenant_id, source })?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttachError from above is used only here, and that gets ?-ed into TenantMapInsertError, via more explicit route than anyhow::Error before.

Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#}) returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.
We can add another variant into TenantMapInsertError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttachError from above is used only here, and that gets ?-ed into TenantMapInsertError, via more explicit route than anyhow::Error before.
Am I wrong, or all related changes could be replaced with a match and anyhow::anyhow!(attach tenant {tenant_id}: {source:#}) returned one way or another?
Looks simpler the latter way to me, unless it's a preparation for something bigger here.

Yeah, right, there is no distinguished treatment.
Will change that.

We can add another variant into TenantMapInsertError.

Nah, if we ever need distinguished treatment of attach errors in the caller, I'd rather make it's Closure variant generic, i.e., enum TenantMapInsertError<OE> { ..., Other(OE) }.

pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching";
pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching";

/// The error message does not include the tenant ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense?
In every function this AttachError is constructed, there's a tenant_id: TenantId argument.
I'd add this field in every enum variant instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a function should not repeat its arguments in the errors it emits.
Rationale:

  • avoids unnecessary copies if the caller has distinguished error handling (i.e., a match, not simply ?)
  • DRY: with your proposal, each error variant needs to be enrichted with tenant_id. With my approach, the caller is responsible for enrichting the error, in exactly one place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could've wrapped the enum with some struct or enforced the ID presence some other way, but up to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me

…tenant load

At the time of writing, the only logical difference between Attach and Load is
that Attach learns the list of timelines by querying the remote storage,
whereas Load learns it by listing the timelines directory of the tenant.

This patch restructures the code such that Attach

1. prepares the on-disk state and then
2. calls into the same `load()` routine that is used by Load.

Further, this patch provides the following fixes & improvements to Attach:

1. Make Attach durable before acknowledging it to the management API client.

   Before this change, we would acknowledge after just creating the in-memory
   tenant. In the event of a crash before creating the tenant directory and
   fsyncing the attaching marker, the pageserver would come up without the
   tenant present (404), even though we acknowledged success to the client.

2. Simplified resume logic if we crash during Attach.
   Before this patch, if we crashed during Attach with some timelines downloaded
   and others not downloaded, we would combine existing metadata files with
   remote ones one-by-one to figure out what's missing.
   That was necessary before on-demand download because we were downloading
   layer files as part of Attach. However, with on-demand download, Attach
   only downloads & writes the timeline metadata files.

   After this patch, when we crash during Attach, we blow away the tenant's
   directory while leaving its attach marker file in place. Then, we start over.

   IMO this is significantly easier to reason about compared to what we had before.
   Note that we were losing the work for the downloads even before this change,
   so that's not a regression (the old reconcile_with_remote would still need
   to download the `IndexPart`s when resuming Attach after a crash).

   If we want to improve on this in the future, I think the first order of
   business will be to avoiding re-downloading the `IndexPart`'s and the
   initial `list_remote_timelines()`.
   However, given that crashes should be rare, and attach events also, I don't
   think the number one priority with Attach code should be to make it as
   simple as possible.

For (2), I changed the location of the attach marker file to be outside the
tenant directory, so that we can use standard functions for removing the
tenant  directory.
I even wrote a migration function for it, although in retrospect, I think it's
quite unlikely that there are any tenants in attaching state deployed.
But oh well, now the code is there, and it even has unit tests.
We can delete the migration code once we've successfully rolled it out to all
regions.

The remaining wrinkle with this change is that Attach needs to hint the
downloaded `IndexPart`s to `load()` so that it doesn't download them twice
during Attach, which would be wasteful.
The mechanism for this is the new `TenantLoadReason` and `TimelineLoadReason`.

We could eliminate this particular case by on-demand downloading the metadata.
However, that might open up another can of worms which I'd like to avoid.
If we ever want to go that route, I suggest we start tracking the attachment
state of a timeline more formally, e.g., in a `timelines.json` file.

This is PR #3466
@problame problame force-pushed the problame/tenant-attach-share-code-path-with-load branch from 764b199 to 4af7a82 Compare January 31, 2023 17:40
@problame
Copy link
Contributor Author

Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up! I like that it became simpler. Left some comments, feel free to just resolve NIT ones. Found nothing serious. If you'd ask I'm +1 for not handling attach marker migration at all. We just need to be sure no one is running attaches during release

pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pub const TENANT_ATTACHING_LEGACY_MARKER_FILENAME: &str = "attaching";
pub const TENANT_ATTACHING_MARKER_SUFFIX: &str = "___attaching";

/// The error message does not include the tenant ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can be sure that log message where this error is printed will contain tenant id as a span field this sounds ok to me

pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
crashsafe::fsync_file_and_parent(&marker_file)
.context("fsync tenant attaching marker file and parent")?;
info!("removing prior attach operation's progress");
std::fs::remove_dir_all(&tenant_dir).context("remove attaching tenant dir")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible problem with this approach in the future. I expect that control plane will pass tenant config to attach call (I think thats reasonable way to pass it because we dont store it on s3). Previously we could've saved the config as usual, but now we will delete it during resume phase. We can write the config to attach marker. Or do it in some other way. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. We can write the config to attach marker. Or do it in some other way. WDYT?

Yeah, that's how I'd solve it in the future. Should we prepare for this by writing

{ "format_version": ` }

into the new attach markers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now its simple, the file is empty, so we can check for it and think about format later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future version won't be able to distinguish a partially written file that it wrote before a crash from a file written by a version with this patch, then.

I'll add this now, this PR is in no big rush anyways, let's get it right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does #3519 look like to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed #3519

@problame
Copy link
Contributor Author

problame commented Feb 1, 2023

Thanks for the review, will address the nits tomorrow.

If you'd ask I'm +1 for not handling attach marker migration at all. We just need to be sure no one is running attaches during release

You think the risk of a bug in the migration code outweighs the risk that there might be some attach markers somewhere?

Idea:

  • remove the migration code
  • make this PR modify the deploy Ansible script to check that there's no old attach markers present before deploying the new binary.
  • after the deployment, remove that check from the deploy Ansible script

@LizardWizzard
Copy link
Contributor

You think the risk of a bug in the migration code outweighs the risk that there might be some attach markers somewhere?

Realistically I think there wont be any attach markers, and its easy to check for that before the release. So it wont trigger any possible bugs in migration code.

So IMO this PR illustrates the need for automated changelog entry generation. I e there should be a possibility to leave a note that will be picked up in release PR so person who will deploy a release can check for additional safety requirements. cc @vadim2404 @shanyp

In this case I think its ok to go with simplest approach. Migration code looks ok to me, so may not worth it to spend time hacking something in ansible

@shanyp
Copy link
Contributor

shanyp commented Feb 2, 2023

yes, actually we have the /release-notes label for it

@LizardWizzard
Copy link
Contributor

Would be cool to revive this patch

@LizardWizzard
Copy link
Contributor

As discussed moving to draft (for now)

@problame
Copy link
Contributor Author

problame commented Jun 19, 2024

As of many months ago, we're always attaching on startup, there's no more load path.
So, this idea has been implemented, just by a different PR.

@problame problame closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants