-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for binary_mode #357
Conversation
We need the ability to return rows objects which don't have an underlying prepared statement in order to support the single round-trip mode.
I think some of your comment was lost. |
Hi @johto, I'm going to try and summarize how I understand things to make sure that I am understanding your concerns correctly. The problemIt seems to me that the major worry here is that because we are doing single round trip, we can't get any feedback from the server as to what types we should encode our parameters as. If we just assume the type based on the go type, then we will be breaking backwards compatibility because there are some people who rely on being able to pass a Current SolutionSince we have a My thoughtsI agree that there isn't really a need to use the binary encodings to do this, but rather a way to tell the client that we want it to assume how to encode parameters based on the go type rather than the type on the postgres server. Am I right in assuming that the type that is a problem here is the If this is the case, perhaps rather than having |
Right now, this seems like the most sane approach to me. Can you also outline what would happen to the other
Indeed. I'm having trouble finding documentation about the binary format of various types, but I suspect we can discover a lot with a few tests.
Interesting. I would have thought skipping hex-encoding would be noticable. Were the values very large? I think the binary format will really shine with temporal types, which send/recv as int64 (or float8, depending on
We should also not forget about receiving/scanning values in binary format, or should that be a separate discussion?
UUID is a type I care about as well. My initial research indicates the binary format is 16 bytes. |
Your summary looks correct to me.
Yes, bytea is pretty much the only problem here. All other common types either work nicely, or don't work at all without a Valuer.
I think that's worse than what this PR does. That's essentially the same as saying "[]byte is always bytea, everything else should be a string", which is a proper subset of the functionality here. If we define []byte to be the binary representation of the type, the behavior in the bytea case is the exact same, except as an additional nicety the user can choose to use binary mode for e.g. her own custom types. The downside is that there is already code out there which assumes that []byte and string are interchangeable. The only two options I'm considering right now are (optional, as implemented here) binary_mode, or a new mode where everything is assumed to be a string, and even bytea values need a Valuer. |
Right now, they're just sent over as text. It wouldn't be too difficult to send them over in binary, but the problem is that can really screw up programs which worked previously. E.g: package main
import (
_ "github.com/lib/pq"
"database/sql"
"fmt"
)
func main() {
db, err := sql.Open("postgres", "sslmode=disable host=/tmp binary_mode=on")
if err != nil {
panic(err)
}
err = db.Ping()
if err != nil {
panic(err)
}
var v int64
// assume: begin; create table foo(bar int4); insert into foo values (1); commit;
err = db.QueryRow(`SELECT 1 FROM foo WHERE bar = $1`, int32(1)).Scan(&v)
fmt.Println(err)
} will produce
I think the way postgres does this is quite useless. You can only choose to either get all types as binary, or all types as text, so you'd have to also know how to decode values from any types created by extensions you're possibly using. Combine this with the fact that oids of said types change between installations, and you've got yourself a mess. If there was a way to specify that e.g. only built-in data types should be sent over as binary, and then maybe apps could opt-in for other types if they know which oid is which type and know how to decode, it might be sensible to use the binary mode for receiving values from the database. But given the interface we currently have to work with, I wouldn't spend even a minute on trying to make anything happen on that front. |
Yeah, I think that was supposed to say "Both approaches would break backwards compatibility" or something similar. |
Yeesh, makes sense. Only
The Bind message allows the format of each result column to be specified, but of course we'd need the RowDescription to make it work. So, I dislike the name "binary_mode" for what is really single-round-trip. With two round trips, we can do lots of things with binary formats. |
b := cn.writeBuf('Q') | ||
b.string(q) | ||
cn.send(b) | ||
func (cn *conn) awaitSynchronizationPoint() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do much waiting. It looks a lot more like the read*
methods (one or two expected messages, else errorf.)
These small methods are starting to look a lot more like a state machine. Does Postgres publish anything more ... structured about acceptable states/transitions than the Message Flow documentation? |
Oh, sorry, you're right. So we could actually do something like this for proper prepared statements, just not in the single round-trip mode. It could be interesting to see whether that would affect performance or not. |
Right. Thing is, with two round-trips we can do all of that transparently, and there's no need to have a user-visible mode. "[]byte means binary input" is something that's easier to explain to a user than "single round-trip mode", I think. |
Almost, I think? A
I think this "binary mode" heuristic is excellent; I really want to explore what can be done with it.
This may be necessary to support single-round-trip, but I'm not yet convinced it is sufficient to trigger single-round-trip. Your example for int32 highlights my concerns. I suspect there are cases where single-round-trip is not the right choice (and I don't like making poor choices on behalf of the user.) I need to think about the |
} | ||
|
||
// Cheat a bit since the result should be exactly the same as when | ||
// describing a portal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true. If I understand correctly,
- A RowDescription after Query or Bind includes the text/binary format indicators; in both cases a portal exists.
- Otherwise, we must expect retrieved values in the formats specified in Bind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But there's currently only support for text mode, so that shouldn't be an issue yet.
With regards to whether to require Valuers for everything or just send everything as a string to postgres except for I think encoding everything in the binary format is much more complicated given the troubles you ran into with |
I believe "binary mode" is misleading because this approach precludes the sending of binary formats by the backend. I'm okay with this option being called "binary parameters". |
I'm fine with just never sending ints in binary with the non-prepared Query(). It's never going to be slower to do a one round-trip with integers sent over as text compared to two round-trips with integers sent over as binary. And if you really want the binary optimizations, you can always Prepare(). |
binary_parameters works for me. |
I'm not sure whether it would ever be sensible for a Valuer to emit binary representations. |
You are probably right! |
Anything I can do to help push this guy along? Are there tests that I could help write? |
It's mostly missing docs and tests, and I'd be glad if you wanted to help with those. That said, I have the next four days off so I'm likely going to work on this over those days. Which one would you prefer to work on, so we don't step on each others' toes too much? (Or I guess you can also just sit back and relax if you want.) |
I won't have time till this weekend, I'll check in then to see whats left to do. I think if you worked on the docs first, it would make clear exactly what situations we want to test for and would give me more guidance as to what areas/conditions to test. So if you got that done by the weekend I could pick up some tests then. |
Sounds good to me. |
@johto, any direction as to what I should focus on? if not I'll look and see what tests I can contribute. |
Sorry, I had way less time available than I expected during last weekend, and I poured all of it into improving Listener and ListenerConn (and didn't finish even that).
Hmm. I think the way this should work is to have a test-only environment variable which controls whether to use binary mode in tests or not, and then run the tests with binary_mode both on and off in Travis. I think there was one test which didn't work with binary mode on, so that test would either have to be fixed to work with both (possibly not a good idea, I forget which exact test it was). That should cover a lot of the new code. Then on top of that, a couple of tests for cases which only work with binary_parameters (forcing it on regardless of the env var), e.g. sending a binary []byte blob into a Does that sound reasonable? |
Didn't quite finish my thought there: either fixed, or just |
Looks like I'll finally have some time to start tackling this today. Just a quick question as I haven't worked on pull requests from other people's forks before. Do you have a recommended way of doing this. I currently have my own fork, and then I cloned that and added yours as a remote so I could pull down the branch. Should I issue to pull request to your branch with my test changes when I'm done? |
I think it's easier if you just tell where I can find the branch and we can work based on commit hashes. E.g. you can say here "pushed [fixes] in 1a2b3c". |
Here is an initial stab cgilling@e5debbe A few things to note:
I need to step away for a while, hopefully will have some more time this weekend to try to think of more cases that would be good to test. |
Even if it's not that much, it's just two tests. I'd go the full mile and check the SQLSTATE of the error so that not any error would do. |
Can we somehow add PGTEST_BINARY_MODE to the matrix so that it's clearer which combination failed in Travis? It's been a while since I've touched Travis, but my recollection is that this should be possible. Though it'll make the tests run a bit slower due to all of the extra setup/teardown work, but I'd expect there to be a different way of solving that problem. |
I should have some more time and energy this weekend (it's been a crazy week so far). I'll be focusing on getting this and #359 in before Monday. |
I updated my changes with the feedback you gave. cgilling@49990e3 Unfortunately I don't think I have time today to read through the postgres binary format docs to see if there are any more test cases that I should add. Can you think of anymore off the top of your head? |
I've been looking at this again, and I'm kind of unhappy about how it rewrites practically the entire code for handling prepared statements and query results. But I guess ultimately this is more the way we want the code to look like, so maybe that's not so bad. I wonder if this should still be split into multiple commits, where each just takes one part of the code and turns it into the "new" state machine like way of handling the protocol messages. Changes like that should be more easily reviewable, in case something goes wrong and we have to debug this mess a month down the road. |
lots of tests help too :) .. a shared test suite for running queries could work well |
Not really. I'm happy with these tests if you are. |
Yes, especially if they are just setup/prep for the actual feature here. IMO, refactors like that should not touch tests and should be merged before this feature changes any behavior. |
I agree that refactors should be separate commits when possible but I think that requiring them to be separate pull requests might actually discourage refactoring. The reason I think this is because most refactoring will happen as part of trying to implement something else. This pull request is a good example of that. The reason there was a refactor is because the same code was suddenly needed in multiple places so rather than duplicating code, it was refactored into some common functions for easy re-use. And writeBuf was refactored to make it easier to work with to send multiple commands in a single request. I would think that the refactor being in a separate commit that can be shown has passing tests would be sufficient, perhaps even with more tests added in places that test coverage was found to be lacking. @johto, is there anything I can do to help push this along? If you won't be able to get to this for a while I could work on splitting up the code into multiple commits. |
I'm hoping to commit #364 in some form today, and then move on to finishing this one. |
Well that plan didn't work out quite that well. I've now implemented this PR in a private branch which I'll push today or tomorrow. How would you like to get acknowledged, cgilling? Just as "cgilling", or with your real name? If the latter, what is it? |
Great news to hear about the progress. Thanks for doing the bulk of the work. Really looking forward to starting to use it. My name is Chris Gilling, I suppose using my real name would be a good idea. Thanks again. |
I've pushed this in 2997d16, albeit without docs. I can't finish the docs before next week, and I'm tired of having the code sitting on my hard drive unused. Please give it a go and report any problems you might experience. Thanks for your patience! |
btw, my coworker mentioned this fix in a lightning talk he gave at gophercon: https://sourcegraph.com/blog/live/gophercon2015/123745231430 (the summary isn't really accurate to the main point of his talk) |
Cool! Are the problems fixed now? I haven't had the chance to play around with this code yet personally :-( |
It definitely works with pgbouncer in transaction mode and we haven't run into any problems. We're now using it for all our postgres db stuff in go, so it passes all our automated tests (unfortunately, haven't gotten any of them to production yet) |
this link is broken https://sourcegraph.com/blog/live/gophercon2015/123745231430 |
@cgilling - I was wondering, its been 5 years, how has the But I would love to use it if its still working :) |
Oh - I see that |
@shannontan-bolt It has been working. The main reason we wanted it was because the implementation of it played nicely with pgbouncer running in transaction mode, whereas the standard mode doesn't because it would make two round trips which could go to two different connections. That being said, somewhere in the past ~6 month we migrated over to pgx which is recommended by lib/pq itself: https://github.com/lib/pq#status. That works with pgbouncer in transaction mode out of the box, so I would look to moving over to use that. IIRC it wasn't very much work to move over. |
Awesome! Thanks so much, @cgilling. I saw that there were some gotchas around JSONB, but with the exception of that, did you have to take into account anything else? I'm also considering moving over to jackc/pgx - although I think you still need to set some of the statement caching/protocol behaviors. It is a bit harder than I expected though, not quite as drag and drop because we use an ORM that uses lib/pq (GORM v1, GORM v2 uses jackc/pgx). |
Here's my work so far towards making #209 happen. It passes all tests with both binary_mode off and on, which suggests that we don't have enough tests.
This functionality is split into three separate commits:
rows
andstmt
so that arows
object can live without having an associatedstmt
object. Needed for the single round trip mode.There's still likely a lot of work to do:
driver.Value
rs which return a byte slice which isn't actually valid as input for the type in binary mode. If we can adopt a general guideline of "string means text, []byte means binary" then things might work out great, but e.g. allocation penalties of such a guideline are not clear to me. This has some impact on e.g. the ongoing work for supporting arrays. Another approach might be to give up on binary mode altogether and instead always send everything over as text, but provide a special Valuer for bytea values (similarly to how we're likely going to have to have a separate values for bytea arrays). Both approached wouldI'd really like to stress point 3 directly above; this is not the only way to achieve single roundtrip mode, and at the moment I'm not convinced it's the best one, either. The performance gains from using binary mode have been in my tests almost negligible (though I've mostly tested with bytea values, since this patch doesn't use binary mode for e.g. ints even though it theoretically could), so don't get too hung up on that.
Any thoughts?