-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request binary format for bytea row data #359
Conversation
if o == oid.T_bytea { | ||
binary = true | ||
st.rowFmts[i] = formatBinary | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagined this would happen in stmt.prepare() and this code would just send over the pre-prepared byte slice stored in the stmt
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Is it worthwhile to compress this when possible, e.g. all binary or all text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. But I don't think the first version has to do it. Let's see what the numbers say first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 21774b5.
Yeah, I have a server I can use to test this. I can have a look once you've fixed the small issue we talked about in the code comments; I'd expect that to show up in repeated execution. |
@@ -460,7 +461,7 @@ func TestByteaOutputFormats(t *testing.T) { | |||
return | |||
} | |||
|
|||
testByteaOutputFormat := func(f string) { | |||
testByteaOutputFormat := func(f string, s bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the name "s" here; it's not at all clear what it means. Maybe "prepare" or "prepared" or "usePrepared" would be more clear?
I had a look at this, and for bytea the results look really promising. Up to 40% speed improvement on bytea values as small as 1kB! Nice work! I also did some hacking on it (you can see my work here), and was quite disappointed to find that I couldn't come up with a test case where scanning 64-bit integers was consistently faster than just decoding their text representations. I didn't try floats, though; there might be an improvement to be had there. Another interesting case might be decoding timestamps from their binary format, but I didn't implement it today. Did I miss something? Are there other cases where binary decoding could be faster? Can someone see ways of speeding up the code? |
.. and now it of course hit me that I should test for queries which returns a lot of rows with an int64 column. I feel really dumb. Returning 10k rows with the integer value 4611686018427387904 I get around 30% of an increase in performance. The value being 65536 results in a ~16% increase, and finally with a value of 1 the performance is comparable. So I think we should use the binary mode for integers as well. |
Very good news! My remaining doubts were around small values (e.g. 1-byte decimal vs 8-byte binary.) |
Even if it looks like this feature is never a net loss, I think it might make sense to provide an option to turn it off. Shame that there doesn't seem to be a nice way of doing that on a per-statement basis, but I guess being able to set it in the connection string is better than nothing. |
That seems fair. What might this config be called? Will it be separate from #357? |
Yeah, these should be two separate options. I'm not having any solid ideas pop up right now. Maybe |
I've pushed this in c4afb3f after a fair bit of additional hacking. I was afraid to do floats because floats scare the hell out of me. I also didn't do any time-related types, though I think therein lies a ton of extra performance. UUIDs might be useful as well. Patches with performance benchmarks welcome. Anyway, I'm done hacking on this for now unless I broke something. Thanks for your work, and I hope I didn't break things too badly! |
Awesome! |
I think this will not be backward compatible. The UUID package I've been using lately handles it nicely, however the top result in Google for "go uuid" (for me) is really awful. I have one or two old projects that call Those old projects can burn, of course. I think we should return UUIDs in binary format. If compatibility is a concern, we can allow some configuration to re-enable "the old way." |
I agree completely, but my initial dives into these binary formats left me with more questions than answers (e.g. where are the timezones?) In any case, you've made it very easy to add support for binary formats incrementally! |
The idea there was to do it in a backwards compatible way, i.e. transfer over in binary and then format to text on the Go side. The gain would be in the fact that there's less data transferred over the wire; the amount of work done is still the same since we can't just return a []byte with the binary UUID. |
This is some quick code to demonstrate what it would be like to receive binary formats for some data types in prepared statements. Integer, float and timestamp types may be other candidates, but I haven't looked into it enough.
@johto if you have a setup to reasonably benchmark this, I would greatly appreciate it.