Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic layer eviction #3552

Merged
merged 1 commit into from
Feb 9, 2023
Merged

automatic layer eviction #3552

merged 1 commit into from
Feb 9, 2023

Conversation

problame
Copy link
Contributor

@problame problame commented Feb 6, 2023

automatic layer eviction

This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.

Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold

The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.

This feature has no explicit killswitch since it's off by default.

Try out with neon_local by putting the following into pageserver.toml

[tenant_config]
eviction_policy = { kind = "LayerAccessThreshold" , period = "2s", threshold = "10s" }

@problame
Copy link
Contributor Author

problame commented Feb 6, 2023

NB: the stats reset feature conflicts with this one: if we continuously reset the stats, then the eviction loop has no last access date to base the decision on. Need to figure out what to do here, reset is quite useful in #3540

@problame problame mentioned this pull request Feb 7, 2023
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good! So the minimum value for unused is the periodic e2e check interval, and ... That's it? What's missing for this to be put very aggressively into staging?

Well, could move all of the eviction stuff into tenant/timeline/eviction.rs for example.

@problame
Copy link
Contributor Author

problame commented Feb 7, 2023

The last pushes should address all review comments.

I'll rebase this PR on top of #3557 once that's merged.

@problame problame self-assigned this Feb 7, 2023
problame added a commit that referenced this pull request Feb 8, 2023
The auto-eviction PR (#3552) operates in two phaes:

  1. find candidate layers
  2. evict them.

For (2), a batch API like the one added in this commit is useful.

Note that this PR requires #3558 to be merged first.
Otherwise, the tests won't pass.
@problame
Copy link
Contributor Author

problame commented Feb 8, 2023

NB: the stats reset feature conflicts with this one: if we continuously reset the stats, then the eviction loop has no last access date to base the decision on. Need to figure out what to do here, reset is quite useful in #3540

Addressed this by duplicating the LayerAccessStatsInner.
See commit
maintain stats twice, once for api, once for eviction policy

Well, could move all of the eviction stuff into tenant/timeline/eviction.rs for example.

Good idea, did that.

problame added a commit that referenced this pull request Feb 8, 2023
The auto-eviction PR (#3552) operates in two phaes:

  1. find candidate layers
  2. evict them.

For (2), a batch API like the one added in this commit is useful.

Note that this PR requires #3558 to be merged first.
Otherwise, the tests won't pass.
@problame problame marked this pull request as ready for review February 8, 2023 13:43
@problame problame requested review from a team as code owners February 8, 2023 13:43
@problame problame requested review from funbringer and removed request for a team February 8, 2023 13:43
@problame
Copy link
Contributor Author

problame commented Feb 8, 2023

Need to figure out how to implement tenant config update via API before we can merge this.

Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of questions, minor suggestions.

control_plane/src/pageserver.rs Show resolved Hide resolved
libs/pageserver_api/src/models.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline.rs Show resolved Hide resolved
pageserver/src/tenant/timeline/eviction_task.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
@problame
Copy link
Contributor Author

problame commented Feb 9, 2023

For some reason GitHub doesn't show a comment box for some of your comments, so, I'll reply here:

So this is json within json, why? Aah, unsure if I've ever used this neon_local functionlity. Does seem weird but what are the alternatives?

Well, I don't want to spend time on implementing a CLI-friendly representation of the eviction policy variants here. IMO there should be a serde_cli crate that standardizes such stuff.

Any objections with going forward with JSON?

This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.

Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold

The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.

This feature has no explicit killswitch since it's off by default.
@problame
Copy link
Contributor Author

problame commented Feb 9, 2023

Force-push: squash & proper commit message (updated top-level PR description as well)

@problame problame enabled auto-merge (rebase) February 9, 2023 12:12
@problame problame merged commit 175a577 into main Feb 9, 2023
@problame problame deleted the problame/automatic-eviction branch February 9, 2023 12:33
@shanyp shanyp mentioned this pull request Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants