Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: refactor TenantId to TenantShardId in Tenant & Timeline #5957

Merged
merged 2 commits into from
Nov 29, 2023

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Nov 28, 2023

(includes two preparatory commits from #5960)

Problem

To accommodate multiple shards in the same tenant on the same pageserver, we must include the full TenantShardId in local paths. That means that all code touching local storage needs to see the TenantShardId.

Summary of changes

  • Replace tenant_id: TenantId with tenant_shard_id: TenantShardId on Tenant, Timeline and RemoteTimelineClient.
  • Use TenantShardId in helpers for building local paths.
  • Update all the relevant call sites.

This doesn't update absolutely everything: things like PageCache, TaskMgr, WalRedo are still shard-naive. The purpose of this PR is to update the core types so that others code can be added/updated incrementally without churning the most central shared types.

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Nov 28, 2023
Copy link

github-actions bot commented Nov 28, 2023

2400 tests run: 2303 passed, 0 failed, 97 skipped (full report)


Flaky tests (5)

Postgres 14

  • test_branching_with_pgbench[cascade-1-10]: debug
  • test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: debug
  • test_ondemand_download_timetravel[real_s3]: debug
  • test_pg_regress: release
  • test_timeline_deletion_with_files_stuck_in_upload_queue: debug

Code coverage (full report)

  • functions: 54.5% (9181 of 16844 functions)
  • lines: 82.1% (53537 of 65174 lines)

The comment gets automatically updated with the latest test results
02b8603 at 2023-11-29T15:03:48.860Z :recycle:

@jcsp jcsp force-pushed the jcsp/sharding-tenant-ids branch from 31f613a to ef28282 Compare November 28, 2023 17:03
@jcsp jcsp requested a review from problame November 28, 2023 17:04
@jcsp jcsp marked this pull request as ready for review November 28, 2023 17:05
@jcsp jcsp requested review from a team as code owners November 28, 2023 17:05
@jcsp jcsp requested review from save-buffer and removed request for a team November 28, 2023 17:05
@jcsp jcsp force-pushed the jcsp/sharding-tenant-ids branch from ef28282 to 80e6666 Compare November 28, 2023 22:26
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the debug_assert_current_span_has_tenant_id and debug_assert_current_span_has_tenant_and_timeline_id now require shard_id as well?

@jcsp
Copy link
Collaborator Author

jcsp commented Nov 29, 2023

Should the debug_assert_current_span_has_tenant_id and debug_assert_current_span_has_tenant_and_timeline_id now require shard_id as well?

Yes indeed, but later: I intend to do that once all the TenantId locations are updated to be shard-aware (this PR is just the primary Tenant/Timeline path)

@jcsp jcsp force-pushed the jcsp/sharding-tenant-ids branch from 80e6666 to e6ec401 Compare November 29, 2023 09:51
jcsp added a commit that referenced this pull request Nov 29, 2023
…5960)

Precursor for #5957

## Problem

When DeletionList was written, TenantId/TimelineId didn't have
human-friendly modes in their serde. #5335 added those, such that the
helpers used in serialization of HashMaps are no longer necessary.

## Summary of changes

- Add a unit test to ensure that this change isn't changing anything
about the serialized form
- Remove the serialization helpers for maps of Id
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you ensure that in all the places we use Serialize, Display, Debug impls for TenantShardId, it's actually supposed to print the full TenantShardId and not just TenantId?

It wouldn't be noticable right now because we only have TenantShardId::unsharded. But, it could bite us at some later point.

For example, how did you ensure you caught all the #[instrument()] cases.

The debug_assert_span_contains... is useful for that.

Lastly I wonder whether our tracing spans should includethe monotonic version number of the sharding config (I assume such a thing exists somewhere outside of PS).

pageserver/src/consumption_metrics/metrics.rs Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Show resolved Hide resolved
pageserver/src/tenant/remote_timeline_client.rs Outdated Show resolved Hide resolved
pageserver/src/deletion_queue.rs Outdated Show resolved Hide resolved
pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved
@jcsp jcsp force-pushed the jcsp/sharding-tenant-ids branch from e6ec401 to 02b8603 Compare November 29, 2023 13:48
@jcsp jcsp enabled auto-merge (squash) November 29, 2023 13:48
@jcsp jcsp merged commit 9e55ad4 into main Nov 29, 2023
37 checks passed
@jcsp jcsp deleted the jcsp/sharding-tenant-ids branch November 29, 2023 14:52
jcsp added a commit that referenced this pull request Dec 11, 2023
## Problem

In #5957, the most essential
types were updated to use TenantShardId rather than TenantId. That
unblocked other work, but didn't fully enable running multiple shards
from the same tenant on the same pageserver.

## Summary of changes

- Use TenantShardId in page cache key for materialized pages
- Update mgr.rs get_tenant() and list_tenants() functions to use a shard
id, and update all callers.
- Eliminate the exactly_one_or_none helper in mgr.rs and all code that
used it
- Convert timeline HTTP routes to use tenant_shard_id

Note on page cache:
```
struct MaterializedPageHashKey {
    /// Why is this TenantShardId rather than TenantId?
    ///
    /// Usually, the materialized value of a page@lsn is identical on any shard in the same tenant.  However, this
    /// this not the case for certain internally-generated pages (e.g. relation sizes).  In future, we may make this
    /// key smaller by omitting the shard, if we ensure that reads to such pages always skip the cache, or are
    /// special-cased in some other way.
    tenant_shard_id: TenantShardId,
    timeline_id: TimelineId,
    key: Key,
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants