-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gc/compaction thread pool, take 2 #1933
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach more than the previous one and find it simpler generally, but I'm biased so take me with a grain of salt.
I find somewhat concerning the new gc and compaction task management, but if it fixes the thread issues on staging, never mind my concerns.
Looks good otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I like the approach. To me, it is a better option than take 1, but in that regard I'm as biased as @SomeoneToIgnore :) +1 for @SomeoneToIgnore comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing all this.
I think we're getting somewhat on par with what was before, but lack some error observability: let's add more error logging before anyhow::bail
in both gc_loop & compaction_loop.
AFAIK, we were not restarting errored gc and compaction threads too before, so all my other comments are nice to have, but nothing required for this PR.
Personally I'm ok with this rwlock approach, though I'm for unification where possible, so if walreceiver approach with cancellation channels works here too I would use it here ass well. Or if we're changing it to something different lets change it in both places. Currently tenant state management already has some problems we've discussed possible improvements with @SomeoneToIgnore so we can polish it separately. What is important this patch should solve the thread count issue. Do you have ideas on tests we can add to check that all this works correctly? |
Most tests that I'd like to write I already know will fail. For example, there's the state management race conditions. Also I tried testing idle->active->idle->active transitions, but the only way to make a tenant idle is to detach it. Stopping the compute node doesn't make it idle. We need to rework tenant state management. I think @SomeoneToIgnore is already working on a RFC so I don't want to get in too deep here |
I'll write a test to assert that tasks started and stopped at least |
pageserver/src/tenant_tasks.rs
Outdated
|
||
// Spawn new task, request cancellation of the old one if exists | ||
let (cancel_send, cancel_recv) = watch::channel(()); | ||
// TODO this instrument doesn't work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why. info!
traces from inside the future don't show up with compaction loop
as context. Will fix before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing trace_span
with info_span
fixes the issue. What are the logging level semantics here? Does a span with a certain level apply only to events of an equal or stronger level? The docs don't say anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thats weird, I tried the example from docs here https://docs.rs/tracing/latest/tracing/index.html#configuring-attributes and it worked for me, the info message was emitted. Though it does not use futures. I'll try reproduce it tomorrow the code from this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
One nit: I think the place with the new file lock may benefit from an extended comment mentioning cases which are not covered by the lock. Maybe even point to an issue
It should conflict with my PR #1936, so go ahead and I'll rebase my patch on top of yours :)
stage metrics, fyi https://observer.zenith.tech/d/GOx33ve7z/tenant-tasks?orgId=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40 working threads and 100 max_blocking_threads means that we actually can do 100 compactions in parallel. It seems to be too much. The question is what is proper value of this parameters. Even two parallel compactions can significantly degrade performance of pageserver. But if we limit it to 1, then compaction will be performed too slowly and it can also affect performance because of read amplification and extra page reconstructions.
Three my main concerns about this patch:
- Compaction thread performs both compaction and materialization. This are different operation. I still think that them should be separated.
- Except compaction, a lot of IO is produced by frozen layer flush. And it is performed
by other threads which are not controlled by this page pool. If the main goal of this patch is to reduce number of threads, then it may be not considered as big problem (although number of spawned flush threads can be as much as tenants, so it still can be too much). But if we want also to avoid "write storm" by reducing number of expensive IO operations concurrently performed, then we should take this flush threads into account. - This thread pool is used both for GC and compaction threads.
But this operation have completely different complexity. GC just iterate layers and read layer metadata. It is relatively fast operation and not IO intensive. While compaction is very IO intensive operation can can take a lot of time. May be it is not so good idea to put them in single thread pool.
Resolves #1646
Resolves #1623
Previous attempt: #1794