Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request binary format for bytea row data #359

Closed
wants to merge 2 commits into from

Conversation

cbandy
Copy link
Contributor

@cbandy cbandy commented Apr 7, 2015

This is some quick code to demonstrate what it would be like to receive binary formats for some data types in prepared statements. Integer, float and timestamp types may be other candidates, but I haven't looked into it enough.

@johto if you have a setup to reasonably benchmark this, I would greatly appreciate it.

if o == oid.T_bytea {
binary = true
st.rowFmts[i] = formatBinary
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagined this would happen in stmt.prepare() and this code would just send over the pre-prepared byte slice stored in the stmt struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Is it worthwhile to compress this when possible, e.g. all binary or all text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. But I don't think the first version has to do it. Let's see what the numbers say first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 21774b5.

@johto
Copy link
Contributor

johto commented Apr 7, 2015

@johto if you have a setup to reasonably benchmark this, I would greatly appreciate it.

Yeah, I have a server I can use to test this. I can have a look once you've fixed the small issue we talked about in the code comments; I'd expect that to show up in repeated execution.

@@ -460,7 +461,7 @@ func TestByteaOutputFormats(t *testing.T) {
return
}

testByteaOutputFormat := func(f string) {
testByteaOutputFormat := func(f string, s bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name "s" here; it's not at all clear what it means. Maybe "prepare" or "prepared" or "usePrepared" would be more clear?

@johto
Copy link
Contributor

johto commented Apr 23, 2015

I had a look at this, and for bytea the results look really promising. Up to 40% speed improvement on bytea values as small as 1kB! Nice work!

I also did some hacking on it (you can see my work here), and was quite disappointed to find that I couldn't come up with a test case where scanning 64-bit integers was consistently faster than just decoding their text representations. I didn't try floats, though; there might be an improvement to be had there. Another interesting case might be decoding timestamps from their binary format, but I didn't implement it today.

Did I miss something? Are there other cases where binary decoding could be faster? Can someone see ways of speeding up the code?

@johto
Copy link
Contributor

johto commented Apr 23, 2015

I also did some hacking on it (you can see my work here), and was quite disappointed to find that I couldn't come up with a test case where scanning 64-bit integers was consistently faster than just decoding their text representations.

.. and now it of course hit me that I should test for queries which returns a lot of rows with an int64 column. I feel really dumb.

Returning 10k rows with the integer value 4611686018427387904 I get around 30% of an increase in performance. The value being 65536 results in a ~16% increase, and finally with a value of 1 the performance is comparable. So I think we should use the binary mode for integers as well.

@cbandy
Copy link
Contributor Author

cbandy commented Apr 24, 2015

Very good news! My remaining doubts were around small values (e.g. 1-byte decimal vs 8-byte binary.)

@johto
Copy link
Contributor

johto commented May 14, 2015

Even if it looks like this feature is never a net loss, I think it might make sense to provide an option to turn it off. Shame that there doesn't seem to be a nice way of doing that on a per-statement basis, but I guess being able to set it in the connection string is better than nothing.

@cbandy
Copy link
Contributor Author

cbandy commented May 14, 2015

That seems fair. What might this config be called? Will it be separate from #357?

@johto
Copy link
Contributor

johto commented May 29, 2015

That seems fair. What might this config be called? Will it be separate from #357?

Yeah, these should be two separate options. I'm not having any solid ideas pop up right now. Maybe disable_prepared_statement_binary_params? It shouldn't matter if it's longish, since it should be only used very rarely, if by anyone at all, ever.

@johto
Copy link
Contributor

johto commented May 30, 2015

I've pushed this in c4afb3f after a fair bit of additional hacking. I was afraid to do floats because floats scare the hell out of me. I also didn't do any time-related types, though I think therein lies a ton of extra performance. UUIDs might be useful as well. Patches with performance benchmarks welcome.

Anyway, I'm done hacking on this for now unless I broke something. Thanks for your work, and I hope I didn't break things too badly!

@johto johto closed this May 30, 2015
@cbandy
Copy link
Contributor Author

cbandy commented May 30, 2015

Awesome!

@cbandy
Copy link
Contributor Author

cbandy commented May 30, 2015

UUIDs might be useful as well.

I think this will not be backward compatible. The UUID package I've been using lately handles it nicely, however the top result in Google for "go uuid" (for me) is really awful. I have one or two old projects that call Parse() a lot.

Those old projects can burn, of course. I think we should return UUIDs in binary format. If compatibility is a concern, we can allow some configuration to re-enable "the old way."

@cbandy
Copy link
Contributor Author

cbandy commented May 30, 2015

I also didn't do any time-related types, though I think therein lies a ton of extra performance.

I agree completely, but my initial dives into these binary formats left me with more questions than answers (e.g. where are the timezones?)

In any case, you've made it very easy to add support for binary formats incrementally!

@johto
Copy link
Contributor

johto commented May 30, 2015

UUIDs might be useful as well.

I think this will not be backward compatible. The UUID package I've been using lately handles it nicely, however the top result in Google for "go uuid" (for me) is really awful. I have one or two old projects that call Parse() a lot.

The idea there was to do it in a backwards compatible way, i.e. transfer over in binary and then format to text on the Go side. The gain would be in the fact that there's less data transferred over the wire; the amount of work done is still the same since we can't just return a []byte with the binary UUID.

@cbandy cbandy deleted the binary-row-data branch June 11, 2015 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants