Request binary format for bytea row data #359

cbandy · 2015-04-07T04:56:27Z

This is some quick code to demonstrate what it would be like to receive binary formats for some data types in prepared statements. Integer, float and timestamp types may be other candidates, but I haven't looked into it enough.

@johto if you have a setup to reasonably benchmark this, I would greatly appreciate it.

johto · 2015-04-07T08:00:22Z

conn.go

+		if o == oid.T_bytea {
+			binary = true
+			st.rowFmts[i] = formatBinary
+		}


I'd imagined this would happen in stmt.prepare() and this code would just send over the pre-prepared byte slice stored in the stmt struct.

Sounds reasonable. Is it worthwhile to compress this when possible, e.g. all binary or all text?

Probably. But I don't think the first version has to do it. Let's see what the numbers say first.

Fixed in 21774b5.

johto · 2015-04-07T11:31:17Z

@johto if you have a setup to reasonably benchmark this, I would greatly appreciate it.

Yeah, I have a server I can use to test this. I can have a look once you've fixed the small issue we talked about in the code comments; I'd expect that to show up in repeated execution.

johto · 2015-04-11T19:36:13Z

encode_test.go

@@ -460,7 +461,7 @@ func TestByteaOutputFormats(t *testing.T) {
 		return
 	}

-	testByteaOutputFormat := func(f string) {
+	testByteaOutputFormat := func(f string, s bool) {


I don't like the name "s" here; it's not at all clear what it means. Maybe "prepare" or "prepared" or "usePrepared" would be more clear?

johto · 2015-04-23T22:25:44Z

I had a look at this, and for bytea the results look really promising. Up to 40% speed improvement on bytea values as small as 1kB! Nice work!

I also did some hacking on it (you can see my work here), and was quite disappointed to find that I couldn't come up with a test case where scanning 64-bit integers was consistently faster than just decoding their text representations. I didn't try floats, though; there might be an improvement to be had there. Another interesting case might be decoding timestamps from their binary format, but I didn't implement it today.

Did I miss something? Are there other cases where binary decoding could be faster? Can someone see ways of speeding up the code?

johto · 2015-04-23T22:36:28Z

I also did some hacking on it (you can see my work here), and was quite disappointed to find that I couldn't come up with a test case where scanning 64-bit integers was consistently faster than just decoding their text representations.

.. and now it of course hit me that I should test for queries which returns a lot of rows with an int64 column. I feel really dumb.

Returning 10k rows with the integer value 4611686018427387904 I get around 30% of an increase in performance. The value being 65536 results in a ~16% increase, and finally with a value of 1 the performance is comparable. So I think we should use the binary mode for integers as well.

cbandy · 2015-04-24T14:41:21Z

Very good news! My remaining doubts were around small values (e.g. 1-byte decimal vs 8-byte binary.)

johto · 2015-05-14T16:52:42Z

Even if it looks like this feature is never a net loss, I think it might make sense to provide an option to turn it off. Shame that there doesn't seem to be a nice way of doing that on a per-statement basis, but I guess being able to set it in the connection string is better than nothing.

cbandy · 2015-05-14T18:39:27Z

That seems fair. What might this config be called? Will it be separate from #357?

johto · 2015-05-29T23:09:49Z

That seems fair. What might this config be called? Will it be separate from #357?

Yeah, these should be two separate options. I'm not having any solid ideas pop up right now. Maybe disable_prepared_statement_binary_params? It shouldn't matter if it's longish, since it should be only used very rarely, if by anyone at all, ever.

johto · 2015-05-30T02:09:57Z

I've pushed this in c4afb3f after a fair bit of additional hacking. I was afraid to do floats because floats scare the hell out of me. I also didn't do any time-related types, though I think therein lies a ton of extra performance. UUIDs might be useful as well. Patches with performance benchmarks welcome.

Anyway, I'm done hacking on this for now unless I broke something. Thanks for your work, and I hope I didn't break things too badly!

cbandy · 2015-05-30T05:34:52Z

Awesome!

cbandy · 2015-05-30T05:58:04Z

UUIDs might be useful as well.

I think this will not be backward compatible. The UUID package I've been using lately handles it nicely, however the top result in Google for "go uuid" (for me) is really awful. I have one or two old projects that call Parse() a lot.

Those old projects can burn, of course. I think we should return UUIDs in binary format. If compatibility is a concern, we can allow some configuration to re-enable "the old way."

cbandy · 2015-05-30T06:04:31Z

I also didn't do any time-related types, though I think therein lies a ton of extra performance.

I agree completely, but my initial dives into these binary formats left me with more questions than answers (e.g. where are the timezones?)

In any case, you've made it very easy to add support for binary formats incrementally!

johto · 2015-05-30T08:24:19Z

UUIDs might be useful as well.

I think this will not be backward compatible. The UUID package I've been using lately handles it nicely, however the top result in Google for "go uuid" (for me) is really awful. I have one or two old projects that call Parse() a lot.

The idea there was to do it in a backwards compatible way, i.e. transfer over in binary and then format to text on the Go side. The gain would be in the fact that there's less data transferred over the wire; the amount of work done is still the same since we can't just return a []byte with the binary UUID.

Request binary format for bytea row data

109393f

johto reviewed Apr 7, 2015
View reviewed changes

Send format codes for every result column

21774b5

johto reviewed Apr 11, 2015
View reviewed changes

johto mentioned this pull request May 14, 2015

Add support for binary_mode #357

Closed

johto closed this May 30, 2015

cbandy deleted the binary-row-data branch June 11, 2015 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request binary format for bytea row data #359

Request binary format for bytea row data #359

cbandy commented Apr 7, 2015

johto Apr 7, 2015

cbandy Apr 7, 2015

johto Apr 7, 2015

cbandy Apr 7, 2015

johto commented Apr 7, 2015

johto Apr 11, 2015

johto commented Apr 23, 2015

johto commented Apr 23, 2015

cbandy commented Apr 24, 2015

johto commented May 14, 2015

cbandy commented May 14, 2015

johto commented May 29, 2015

johto commented May 30, 2015

cbandy commented May 30, 2015

cbandy commented May 30, 2015

cbandy commented May 30, 2015

johto commented May 30, 2015

Request binary format for bytea row data #359

Request binary format for bytea row data #359

Conversation

cbandy commented Apr 7, 2015

johto Apr 7, 2015

Choose a reason for hiding this comment

cbandy Apr 7, 2015

Choose a reason for hiding this comment

johto Apr 7, 2015

Choose a reason for hiding this comment

cbandy Apr 7, 2015

Choose a reason for hiding this comment

johto commented Apr 7, 2015

johto Apr 11, 2015

Choose a reason for hiding this comment

johto commented Apr 23, 2015

johto commented Apr 23, 2015

cbandy commented Apr 24, 2015

johto commented May 14, 2015

cbandy commented May 14, 2015

johto commented May 29, 2015

johto commented May 30, 2015

cbandy commented May 30, 2015

cbandy commented May 30, 2015

cbandy commented May 30, 2015

johto commented May 30, 2015