Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sql.Scanner and driver.Valuer interfaces #5

Closed
wants to merge 6 commits into from

Conversation

gust1n
Copy link

@gust1n gust1n commented Aug 12, 2014

To support mapping a struct to db with e.g. https://github.com/jmoiron/modl. But since they are interfaces they are not specific to that implementation.

@gust1n
Copy link
Author

gust1n commented Aug 18, 2014

Any thoughts of this? Otherwise I might start migrating all my projects to use my fork. But I'd rather use your implementation (and give credit to you)

@satori satori self-assigned this Aug 18, 2014
@satori
Copy link
Owner

satori commented Aug 18, 2014

@gust1n I'm sorry to have kept you waiting so long. Thank you for your pull request, I think SQL driver support is a great enhancement for go.uuid.

@satori
Copy link
Owner

satori commented Aug 18, 2014

Generally, UUID can be represented in at least two ways: as a 32 or 36 character ASCII string or as 16 byte array ([16]byte in Go). Although there are some cons and pros on storing UUID as a string, I believe that []byte is the more technically correct database representation for UUID datatype.

Let's take a look at a string representation. The advantages are simple: UUID string is recognizable as UUID and easy to read by human. The disadvantage is mainly that of space, a 36 character string is going to take at least 36 bytes to store.

Let's take a look at database world now.

On the one side, PostgreSQL has builtin UUID datatype, represented as 16 byte array. PostgreSQL command-line client (psql) is able to represent UUIDs in human-readable form (just like String method). In that case Value method could be simply implemented in term of MarshalBinary.

On the other side, MySQL and SQLite lack of builtin UUID datatype. If we are looking at SQLite, for example, there is at least two ways to store UUID: as a TEXT column or BLOB column. The first way will take at least 36 bytes for storage, the second one will take at least 16 bytes for storage. Why does the size matter? SQLite needs to read data from disk during a query, the OS reads from the disk in blocks (4 KB is a common size) and the more rows that can fit in a block, the more likely it is that a row will be cached in memory when SQLite requests it from disk. Bigger rows mean fewer rows per block and fewer rows cached in memory. Although there is a strong disadvantage of BLOB column as its data is not human-readable, I think that Value method still should be implemented in term of MarshalBinary.
The same stays true for MySQL and BINARY(16) column datatype.

According to the information mentioned above, Value method should be just:

return u.MarshalBinary()

As UUID should not be written to database as a string, there is no need in trying to scan it from data input, so Scan method could be simplified to:

return u.UnmarshalBinary(src.([]byte))

Is there anything I missed?

@gust1n
Copy link
Author

gust1n commented Aug 18, 2014

Nice walk through! I'll be testing this tomorrow. The reason for my type switch was from earlier experiences when pg returned a string for the UUID type (yours). But that might have been since the type didn't implement valuer then. Let me test tomorrow.

Skickat från min telefon

18 aug 2014 kl. 23:09 skrev Maxim Bublis notifications@github.com:

Generally, UUID can be represented in at least two ways: as a 32 or 36 character ASCII string or as 16 byte array ([16]byte in Go). Although there are some cons and pros on storing UUID as a string, I believe that []byte is the more technically correct database representation for UUID datatype.

Let's take a look at a string representation. The advantages are simple: UUID string is recognizable as UUID and easy to read by human. The disadvantage is mainly that of space, a 36 character string is going to take at least 36 bytes to store.

Let's take a look at database world now.

On the one side, PostgreSQL has builtin UUID datatype, represented as 16 byte array. PostgreSQL command-line client (psql) is able to represent UUIDs in human-readable form (just like String method). In that case Value method could be simply implemented in term of MarshalBinary.

On the other side, MySQL and SQLite lack of builtin UUID datatype. If we are looking at SQLite, for example, there is at least two ways to store UUID: as a TEXT column or BLOB column. The first way will take at least 36 bytes for storage, the second one will take at least 16 bytes for storage. Why does the size matter? SQLite needs to read data from disk during a query, the OS reads from the disk in blocks (4 KB is a common size) and the more rows that can fit in a block, the more likely it is that a row will be cached in memory when SQLite requests it from disk. Bigger rows mean fewer rows per block and fewer rows cached in memory. Although there is a strong disadvantage of BLOB column as its data is not human-readable, I think that Value method still should be implemented in term of MarshalBinary.
The same stays true for MySQL and BINARY(16) column datatype.

According to the information mentioned above, Value method should be just:

return u.MarshalBinary()
As UUID should not be written to database as a string, there is no need in trying to scan it from data input, so Scan method could be simplified to:

return u.UnmarshalBinary(src.([]byte))
Is there anything I missed?


Reply to this email directly or view it on GitHub.

@satori
Copy link
Owner

satori commented Aug 18, 2014

Oh, I see now, the problem is definitely in lib/pq#263 and lib/pq#209. Pure Go lib/pq library doesn't implement PostgreSQL binary protocol unlike native libpq.

@gust1n
Copy link
Author

gust1n commented Aug 19, 2014

Yeah ok so it seems that using return u.MarshalBinary() for the Valuer interface using lib/pq causes: ERROR: invalid byte sequence for encoding "UTF8": 0xba (probably as you predicted).

So my code actually works since the UUID inserted actually is a string (which postgres doesn't complain about), hence the Scan will take the path case []byte -> len(b) != 16 -> FromString(string(b))

It actually seems the postgres uuid type (http://www.postgresql.org/docs/9.1/static/datatype-uuid.html) is a string, or am I reading it wrong? Should we instead use the approach you suggested for MySQL and SQlite and save as a bytea?

@satori
Copy link
Owner

satori commented Aug 19, 2014

@gust1n Documentation states only about human-readable representation but not about implementation. Actually UUID is implemented as unsigned char[16] in PostgreSQL:

/* uuid size in bytes */
#define UUID_LEN 16

/* pg_uuid_t is declared to be struct pg_uuid_t in uuid.h */
struct pg_uuid_t
{
    unsigned char data[UUID_LEN];
};

I'll check if u.MarshalBinary() will work for MySQL and SQLite with BINARY(16) and BLOB column types.

@gust1n
Copy link
Author

gust1n commented Aug 20, 2014

I didn't understand, do you want me to check and investigate further or will you do it?

@satori
Copy link
Owner

satori commented Aug 20, 2014

@gust1n I'll check it in the short term, a bit busy right now with my full-time duties.

@satori
Copy link
Owner

satori commented Aug 25, 2014

I've investigated on that problem a bit more.

Both MySQL and SQLite feel just fine with BINARY(16) and BLOB columns. During my tests, I was using go-sql-driver/mysql and mattn/go-sqlite3 correspondingly. I've found them able to deal with binary UUID representation.

Albeit PostgreSQL has builtin UUID datatype, driver implementations do not support binary wire protocol, only its text form. I've tested it against jbarham/gopgsqldriver and lib/pq.

The first one is a simple cgo binding for native libpq library and it shouldn't be too difficult to make it use binary format instead of text one. Actually I was able to make it read and scan UUIDs from binary protocol in a just one-line fix. Write support is a bit harder and requires complete library refactoring. Anyway, this project seems to be a bit dead, last commit was pushed almost 3 years ago.

Also I've discovered deafbybeheading/femebe – a library implementing PostgreSQL's FEBE protocol, unfortunately text version again. Looks like it is not too hard to add binary protocol support, it seems to me a bit cleaner protocol implementation than lib/pq has and can be used to reimplement lib/pq on top of it. Besides, femebe's author is one of active lib/pq committers.

Anyway, in my opinion sql.Scanner and driver.Valuer interfaces should be implemented in terms of MarshalBinary and UnmarshalBinary methods and I'd like to merge such pull request with that changes. Inability to handle binary data is definitely a bug in lib/pq.

@satori
Copy link
Owner

satori commented Aug 25, 2014

Here is an example code I've used during my tests.

@deoxxa
Copy link

deoxxa commented Dec 2, 2014

Oh my god, how did I miss this? I'm going to close my PR now haha!

@satori satori mentioned this pull request Dec 24, 2014
@cbandy
Copy link
Contributor

cbandy commented Mar 1, 2015

Can this be closed now that #10 is merged?

@gust1n gust1n closed this Mar 1, 2015
@satori
Copy link
Owner

satori commented Mar 5, 2015

This pull request also addresses driver.Valuer interface implementation. I'm reopening it as a reminder.

@satori satori reopened this Mar 5, 2015
@satori
Copy link
Owner

satori commented Oct 28, 2015

driver.Valuer interface implementation have been merged by #16. Thanks to all you guys who participated in discussion. In case of need to use binary representation of UUID user can use .Bytes() as suggested by @cbandy in #10.

@satori satori closed this Oct 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants