-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove synchronous postgres_backend #3576
Conversation
There is TODO list above, but otherwise it is ready for review; main tests pass. CC @petuhovskiy |
62ce9ba
to
5eeda9d
Compare
b17ec1c
to
378d0d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at everything but safekeeper/src/send_wal.rs
, going to take another look tomorrow and probably post more minor comments, but so far looks good.
Found some minor doc issues which are not worth making github comment not to pollute discussion:
- There were several typos in the comments, I think it should be possible to run some sort of grammar linter to fix them quickly (e.g. Fix typos #1818 (comment))
- rustdoc comments (
///
) are missed in several places and there are usual//
instead
Also:
write_message_noflush -> write_message
andwrite_message -> write_message_flush
can be confusing or error-prone, inverting thewrite_message
semantics looks strange- Calling
unwrap()
andexpect()
in functions which are returningResult<>
doesn't feel right IMO (except when checked withis_ok
on previous line), even if we are sure that it won't panic normally
Simplifed |
@funbringer Changes to proxy are not too invasive, but what is the basic way of testing this? I replaced guts of stream.rs and switched mgmt.rs to async. |
Run codespell. Most of suggestions were s/crate/create, but I fixed the rest :)
Tried to look through most of files and added some, but point if you see more.
Yeah, I already bumped into this a couple of times. However, postgres_backend_async already has different meaning, so we break semantic anyway; and I'm much more proficient in safekeeper code than in pageserver, so adoped what was already there.
Removed most of these. A couple left, but they are mostly independent to this patch, should be fixed separately. |
// sends, so this avoids deadlocks. | ||
let mut pgb_reader = pgb.split().context("START_WAL_PUSH split")?; | ||
let peer_addr = *pgb.get_peer_addr(); | ||
let res = tokio::select! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use futures::future::join_all
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The very idea is that we stop as soon as any of parts stops; this allows to easily retrieve error (not choosing among two, trying to guess which was the initial cause) and avoid synchronizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like try_join_all can be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work too, but select! here is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that try_join_all has a simpler semantics and thus is a better option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has just different semantics. Once any task finished, whether with Ok or error, we need to stop polling. So on the closer look it doesn't fit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exactly matches try_join_all doc:
If any future returns an error then all other futures will be canceled and an error will be returned immediately.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to stop even if one future returned Ok.
Also reworked error handling, now double logging of closing connections on safekeepers is gone, and generally it is more accurate. In total, this
became this
|
771e372
to
386a6e4
Compare
AFAIS I addressed all reviews so far, please have another look. |
a44e5e6
to
818c34f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look. I'm still not on board with the changes.
Additionally, this patch contains unrelated changes (renaming functions, new psql show command). I would recommend splitting them out because the patch is really big and its hard to keep track of everything during reviews.
I'm still thinking that custom implementation of Framed that is slightly different from one in tokio is not what we need here.
The main motivation for the patch seems to be this change in safekeeper. First of all the PR title and description do not mention this. Commit messages do not reflect this intention and other changes included in this patch. As I said in one of the comments IMO the PR would benefit from splitting into independent parts each with its own motivation. Mixing independent changes leads to extended review time because of increased patch size and at some point many discussions start to block code that has no concerns to be merged without waiting for other conversations to resolve.
Secondly, to my mind similar result can be achieved without requiring split machinery (either tokio or futures one). Both of them seem to have their own cons, allocation of waker in futures, thread::yield_now() in tokio and in general this concept of split contains pitfalls exactly around use case with polling from different tasks (it is mentioned in your comments too). To me better way to achieve desired goal would be to change the way how these two sk tasks communicate with each other. See https://github.com/neondatabase/neon/pull/3576/files#r1118916294 and #3576 (comment)
Additionally taking into account #3667 I'm really not a fan of such a big refactoring without good unit test coverage. This refactoring is an opportunity to move to better structure for our whole postgres protocol handling machinery. Since we deal with parsing of data that came through network it makes sense to add some fuzzying routines. With all the rust tooling available it shouldnt be a really big task.
If you are talking about
This PR added
Yes, it's easier to review several small patches instead of one big, but it takes more time to prepare several small independent PRs.
If you are talking about neon/safekeeper/src/receive_wal.rs Lines 146 to 205 in 000eb1b
This PR mostly preserves it, but also adds a buffered channel for responses, which is now easy to do because of async tasks. I'd say that this PR does mostly what is says, i.e. Removes synchronous postgres_backend; Adds framed.rs; Fixes CopyBoth errors in the logs. The diff looks large because a lot of code was moved in the process, but the logic stays the same in the most places. |
I'm talking about the whole idea around split and the way how different task is used rather than this exact line.
I think you're trying to optimize the wrong part here, for the person who submits the patch its faster to prepare a bigger one, but at the end of the day merging smaller patches is faster than going through reviews for one big PR. Additionally for bigger patches it is easier to miss bugs just because of the amount of code and increased number of iterations. So IMO optimizing for review speed and thus total amount of time it takes to deliver a feature is a better strategy. We should optimize for better work as a team, not for individual contributor convenience.
Not only this, another example is pg_receivewal support. Regarding the rename. It looks like more places used sync semantics, and you suggested to rename them in first comment:
Which lead to increased diff. Personally I dont see a reason why this needs to be done in this patch. It can be split out to separate one that merges naming between sync and async. Also I do think that write_message -> serialize_message can completely side step the confusion. |
Yes, you are proposing to remove split and write a slightly different code, with its own advantages and disadvantages. We can try to do that, but the current version is ported from the sync code that is known to work well.
Sure, I forgot to mention that I'm also in favor of submitting smaller PRs, they're much easier to work with. But I'm not sure if it's worth splitting a PR that is accidentally turned out larger that expected.
+1, it looks unrelated now. But for me it's a small addition that doesn't make anything worse, I'm fine with both leaving and deleting it.
I don't get it. There was no rename, and I haven't suggested to rename anything. In As I explained above, sync code was migrated to async and that made it look like a rename, but |
I propose avoiding the necessity to write extra code. To me custom framed implementation is completely unnecessary.
Code in main doesnt rely on tokio::io::split. I dont see why this is an argument here.
For code that was previously sync it is exactly a rename and this patch inverts the semantics. Additionally see #3576 (comment)
This is why the entire naming conversion thing should be a separate patch. |
As mentioned above, I'm going to do switch to write_message_noflush/write_message in the whole project. I can extract that to separate commit, would that help?
Of course, because it is sync. It instead impressively relies on our own crate specifically designed for splitting potentially tls enabled sync streams.
I don't really mind adding more tests. However, neither I view that as highest priority here; our postgres_backend usage is internal and is really limited, it is mostly about handshake and simple query protocol, both are covered in |
470a11e
to
48bef87
Compare
Did that, doesn't help much with diff... |
…nc.rs To make it unifrom across the project; proxy stream.rs and older postgres_backend uses write_message_noflush.
48bef87
to
061055e
Compare
Rebased again. |
061055e
to
788a24b
Compare
788a24b
to
3fa31d8
Compare
It helped a bit. There is a separate commit with this change now, so the second commit became a little smaller. If you split more changes I believe it will shrink a bit more.
I dont understand this as an argument.
Could you elaborate? I may lack some context
This is only partially true. Rust can be annoying and sometimes the suitable approach significantly differs from what you can think ahead of time. As I previously said in the comment, is there a reason why you cannot hold owned stream without using split and dispatch messages in it? Rust strictness is about guarantees. If you cannot guarantee that it would be the same task then you should consider the worst case. Can someone misuse split abstraction and write code that starts polling it from multiple tasks? Sure. Can we do better? I think so, and refactoring IMO is a perfect place to explore the opportunity.
Ok, when this can be a priority other than in refactoring PR that tries to reduce tech debt?
Integration tests have their weaknesses. In motivation paragraph here #3667 I explain why having unit tests increases developer velocity. This is true especially for small independent components like pg backend. Is it that hard to test Copy that we need to add follow up issue for that? Additionally being "simple" and "internal" cannot be a reason why this shouldnt be covered. You changed a whole lot of code here. And as I already mentioned this PR tries to be a refactoring.
I disagree. The refactoring you did tries hard to mirror logic from previous sync version. Significant part of code is dedicated to this job. So this is exactly about the code added in this patch, thus perfectly relevant.
So we fully rewrite this code for this logic to remain the same. This refactoring is an opportunity to make it clearer and do not return back to this code. Imagine we merged the PR. Tech debt associated with this is still there. Someone needs to get back to it and rewrite it again. Why do we need to rewrite the same thing twice if we already know which problems exist in newer version?
Sure, I'm fine with it. Please create corresponding issue.
IMO this PR in its current form moves forward with uniting everything under async umbrella but it still keeps the tech debt that was there but in a bit different form despite the fact that it is a significant rewrite of the component. So if your proposal is to leave it here, merge the PR and be done, I'm not on board with the approach. Sorry. Maybe you need someone else to break the knot. |
- Use postgres_backend_async throughout safekeeper. - Use framed.rs in postgres_backend_async, similar to tokio_util::codec::Framed but with slightly more efficient split. Now IO functions are also cancellation safe. Also, there is no allocation in each message read anymore, and data in read messages still directly point to input buffer without copies. - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: #3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split. proxy stream.rs is adapted with minimal changes. It also benefits from framed.rs improvements described above.
3fa31d8
to
85b3964
Compare
Some thoughts after reading this thread.
So to sum it up I'm +1 to proceed with the current PR. I also not against trying to avoid sock_split if it will simplify things, but better as a separate PR to check for any perf impact. |
Just a minor correction:
There are two 'allocation during each message processing' issues. First is in bilock.rs which is used when you split tokio Framed. In non-split usage it doesn't exist, so pageserver and proxy would be ok with that. Another one in our message reading code , which affects everything. Switch to framed, whether custom or tokio removes that. |
As Arseny pointed out this is not relevant to tokio framed. Instead with current usage of split there is a "spin lock" with thread::yield_now() on AtomicBool. See https://github.com/tokio-rs/tokio/blob/3d33610ed2420111982e5a42c764761c9060e6ab/tokio/src/io/split.rs#L147. Do we know for sure that this does not have any negative performance impact?
Do we know for sure that this allocation has impact on performance? I agree that this is not good if we can easily avoid it. But if it doesnt have perf impact then having simpler code or less amount of it looks like a better option to me.
Do we have a test case in our test suite that can be used for validation? Did we check our usual perf numbers for this patch to not cause any perf degradation?
IMO arguments are clear:
If these still are not arguments for you, ok. Please note that this PR doesnt have an approval from storage team member (pageserver side of things). Codeowner-wise IMO it is important that each involved team gives an approval on a patch that touches core parts of the code base that the team uses to build components it is responsible for. So IMO thats reasonable to ask someone else. I would suggest Joonas or Christian. And I havent seen an approval from @funbringer either. Anyway, thanks @arssher for fixing all the comments |
Can we close this? |
Superseded by #3759 |
Remove sync postgres_backend_async, tidy up its split usage.
but with slightly more efficient split. Now IO functions are also cancellation
safe. Also, there is no allocation in each message read anymore, and data in
read messages still directly point to input buffer without copies.
select! for easier error handling.
-- join split parts back catching errors from them before returning.
Initially I hoped to do that read-write without split at all, through polling
IO:
#3522
However that turned out to be more complicated than I initially expected
due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>>
which is Send construct just to satisfy the checker; 2) can be workaround with
transmute. But this is so messy that I decided to leave split.
proxy stream.rs is adapted with minimal changes. It also benefits from framed.rs
improvements described above.
TODO: