Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gc/compaction thread pool, take 2 #1933

Merged
merged 39 commits into from
Jul 5, 2022
Merged

Gc/compaction thread pool, take 2 #1933

merged 39 commits into from
Jul 5, 2022

Conversation

bojanserafimov
Copy link
Contributor

@bojanserafimov bojanserafimov commented Jun 15, 2022

Resolves #1646

Resolves #1623

Previous attempt: #1794

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach more than the previous one and find it simpler generally, but I'm biased so take me with a grain of salt.

I find somewhat concerning the new gc and compaction task management, but if it fixes the thread issues on staging, never mind my concerns.
Looks good otherwise.

Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like the approach. To me, it is a better option than take 1, but in that regard I'm as biased as @SomeoneToIgnore :) +1 for @SomeoneToIgnore comments

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing all this.

I think we're getting somewhat on par with what was before, but lack some error observability: let's add more error logging before anyhow::bail in both gc_loop & compaction_loop.

AFAIK, we were not restarting errored gc and compaction threads too before, so all my other comments are nice to have, but nothing required for this PR.

@LizardWizzard
Copy link
Contributor

Personally I'm ok with this rwlock approach, though I'm for unification where possible, so if walreceiver approach with cancellation channels works here too I would use it here ass well. Or if we're changing it to something different lets change it in both places. Currently tenant state management already has some problems we've discussed possible improvements with @SomeoneToIgnore so we can polish it separately. What is important this patch should solve the thread count issue. Do you have ideas on tests we can add to check that all this works correctly?

@bojanserafimov
Copy link
Contributor Author

bojanserafimov commented Jun 29, 2022

Personally I'm ok with this rwlock approach, though I'm for unification where possible, so if walreceiver approach with cancellation channels works here too I would use it here ass well. Or if we're changing it to something different lets change it in both places. Currently tenant state management already has some problems we've discussed possible improvements with @SomeoneToIgnore so we can polish it separately. What is important this patch should solve the thread count issue. Do you have ideas on tests we can add to check that all this works correctly?

Most tests that I'd like to write I already know will fail. For example, there's the state management race conditions. Also I tried testing idle->active->idle->active transitions, but the only way to make a tenant idle is to detach it. Stopping the compute node doesn't make it idle.

We need to rework tenant state management. I think @SomeoneToIgnore is already working on a RFC so I don't want to get in too deep here

@bojanserafimov
Copy link
Contributor Author

I'll write a test to assert that tasks started and stopped at least


// Spawn new task, request cancellation of the old one if exists
let (cancel_send, cancel_recv) = watch::channel(());
// TODO this instrument doesn't work
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why. info! traces from inside the future don't show up with compaction loop as context. Will fix before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing trace_span with info_span fixes the issue. What are the logging level semantics here? Does a span with a certain level apply only to events of an equal or stronger level? The docs don't say anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thats weird, I tried the example from docs here https://docs.rs/tracing/latest/tracing/index.html#configuring-attributes and it worked for me, the info message was emitted. Though it does not use futures. I'll try reproduce it tomorrow the code from this branch

Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

One nit: I think the place with the new file lock may benefit from an extended comment mentioning cases which are not covered by the lock. Maybe even point to an issue

It should conflict with my PR #1936, so go ahead and I'll rebase my patch on top of yours :)

@bojanserafimov bojanserafimov merged commit d29c545 into main Jul 5, 2022
@bojanserafimov bojanserafimov deleted the tenant-tasks branch July 5, 2022 06:06
@bojanserafimov
Copy link
Contributor Author

Copy link
Contributor

@knizhnik knizhnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40 working threads and 100 max_blocking_threads means that we actually can do 100 compactions in parallel. It seems to be too much. The question is what is proper value of this parameters. Even two parallel compactions can significantly degrade performance of pageserver. But if we limit it to 1, then compaction will be performed too slowly and it can also affect performance because of read amplification and extra page reconstructions.

Three my main concerns about this patch:

  1. Compaction thread performs both compaction and materialization. This are different operation. I still think that them should be separated.
  2. Except compaction, a lot of IO is produced by frozen layer flush. And it is performed
    by other threads which are not controlled by this page pool. If the main goal of this patch is to reduce number of threads, then it may be not considered as big problem (although number of spawned flush threads can be as much as tenants, so it still can be too much). But if we want also to avoid "write storm" by reducing number of expensive IO operations concurrently performed, then we should take this flush threads into account.
  3. This thread pool is used both for GC and compaction threads.
    But this operation have completely different complexity. GC just iterate layers and read layer metadata. It is relatively fast operation and not IO intensive. While compaction is very IO intensive operation can can take a lot of time. May be it is not so good idea to put them in single thread pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace per-tenant GC threads with a thread pool or async restart dead threads after error/panic
4 participants