Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post initial tenant_size issues #2748

Closed
koivunej opened this issue Nov 3, 2022 · 3 comments
Closed

Post initial tenant_size issues #2748

koivunej opened this issue Nov 3, 2022 · 3 comments
Milestone

Comments

@koivunej
Copy link
Member

koivunej commented Nov 3, 2022

Initial PR: #2714, condensing the review comments and remaining TODO's, observations here.

#2817 will change a lot, but while that rewrite has been in progress, many of the issues are now handled.


Done or irrelevant now given all of the post-initial changes:

koivunej added a commit that referenced this issue Nov 3, 2022
Tenant size information is gathered by using existing parts of
`Tenant::gc_iteration` which are now separated as
`Tenant::refresh_gc_info`. `Tenant::refresh_gc_info` collects branch
points, and invokes `Timeline::update_gc_info`; nothing was supposed to
be changed there. The gathered branch points (through Timeline's
`GcInfo::retain_lsns`), `GcInfo::horizon_cutoff`, and
`GcInfo::pitr_cutoff` are used to build up a Vec of updates fed into the
`libs/tenant_size_model` to calculate the history size.

The gathered information is now exposed using `GET
/v1/tenant/{tenant_id}/size`, which which will respond with the actual
calculated size. Initially the idea was to have this delivered as tenant
background task and exported via metric, but it might be too
computationally expensive to run it periodically as we don't yet know if
the returned values are any good.

Adds one new metric:
- pageserver_storage_operations_seconds with label `logical_size`
    - separating from original `init_logical_size`

Adds a pageserver wide configuration variable:
- `concurrent_tenant_size_logical_size_queries` with default 1

This leaves a lot of TODO's, tracked on issue #2748.
@LizardWizzard
Copy link
Contributor

Another related point tied to on-demand download. In current on-demand model we need to download significant amount of layers to calculate size. Would be good if we can make incremental calculation reliable through restarts, and if not we'll probably need some tweaking to place sizes in one layer (or metadata) so it is cheaper to obtain it on startup

@koivunej
Copy link
Member Author

koivunej commented Nov 7, 2022

On the #2755 my selection of next_gc_cutoff to represent the non-disk_consistent_lsn oldest retained Lsn raised questions, also the _cutoff name. Originally Heikki proposed last_retain_lsn for this, but I went with next_gc_cutoff because it's calculated so much similarly. My interpretation of how it should be calculated could be equally wrong.

koivunej added a commit that referenced this issue Nov 7, 2022
With more realistic selection of gc_horizon in tests there is an
immediate failure with trying to query logical size with lsn <
initdb_lsn. Fixes that, adds illustration gathered from clarity of
explaining this tenant size calculation to more people.

Cc: #2748, #2599.
@neondatabase-bot neondatabase-bot bot added this to the 2022/12 milestone Nov 16, 2022
@koivunej
Copy link
Member Author

#3377 fixes bugs, adds test cases, some with skipped test annotations and changes how the initial tenant size is calculated to how it should had been all along.

@jcsp jcsp closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants