Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate_logical_size: no longer use spawn_blocking #3664

Merged
merged 2 commits into from
Feb 21, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Feb 21, 2023

Calculation of logical size is now async because of layer downloads, so we shouldn't use spawn_blocking for it. Use of spawn_blocking exhausted resources which are needed by tokio::io::copy when copying from a stream to a file which lead to deadlock.

Fixes: #3657

@koivunej koivunej requested review from a team as code owners February 21, 2023 16:55
@koivunej koivunej requested review from knizhnik and hlinnaka and removed request for a team February 21, 2023 16:55
@koivunej koivunej changed the title Not blocking anymore calculate_logical_size: no longer use spawn_blocking Feb 21, 2023
Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is clearly the right thing to do.

I do wonder about the issue mentioned in the comment that this removes, though:

// Run in a separate thread since this can do a lot of
// synchronous file IO without .await inbetween
// if there are no RemoteLayers that would require downloading.

That's still a valid concern, right? I guess it's a more general issue that we use blocking I/O on files, and we don't need to specifically care about it here.

@koivunej
Copy link
Member Author

That's still a valid concern, right? I guess it's a more general issue that we use blocking I/O on files, and we don't need to specifically care about it here.

This is an issue still yes, more like #2975, but we created a deadlock with the block_on from within spawn_blocking, so yeah, not entirely resolved issue. I don't think it's worthwhile to retain this one comment.

this fixes a bug which exhausted the spawn_blocking pool which hung up
the blocking io operations needed by download remote layer, because
these threads used block_on.
@koivunej
Copy link
Member Author

Yay, first non-transient pass of regress-tests of the evening!

@koivunej koivunej enabled auto-merge (squash) February 21, 2023 19:02
@koivunej koivunej merged commit 225add0 into main Feb 21, 2023
@koivunej koivunej deleted the not_blocking_anymore branch February 21, 2023 19:09
koivunej added a commit that referenced this pull request Feb 22, 2023
Calculation of logical size is now async because of layer downloads, so
we shouldn't use spawn_blocking for it. Use of `spawn_blocking`
exhausted resources which are needed by `tokio::io::copy` when copying
from a stream to a file which lead to deadlock.

Fixes: #3657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver_remote_timeline_client_calls_unfinished metric never reaches 0 for layer downloads
4 participants