Skip to content

Ability to set collation on connection #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
prencher opened this issue Nov 3, 2013 · 5 comments · Fixed by #242
Closed

Ability to set collation on connection #157

prencher opened this issue Nov 3, 2013 · 5 comments · Fixed by #242
Assignees
Milestone

Comments

@prencher
Copy link

prencher commented Nov 3, 2013

By default, you set utf8_general_ci. However, for better language support, utf8_unicode_ci is more appropriate. It would be nice if you could specify the collation along with the charset.

@arnehormann
Copy link
Member

By default, we don't set any collation. The MySQL Handshake supports setting a chracter set, but not a collation.
You have to set the collation yourself.
You can set it by appending collation_connection=utf8_unicode_ci to the DSN, wich executes SET collation_connection=utf8_unicode_ci as the first query when a new connection is established.

@julienschmidt
Copy link
Member

Oh yes, the driver sets a collation

@julienschmidt
Copy link
Member

I must admit that I have no Idea what this actually does...
In the protocol specification it is called "Character Set" but the value must be the name of a collation. See: http://dev.mysql.com/doc/internals/en/character-set.html#packet-Protocol::CharacterSet

I assume that the collation also determines the character set used. For example utf8_general_ci also sets the character set to UTF-8.

@xaprb
Copy link

xaprb commented Nov 3, 2013

Just going from memory. A character set has a default collation. If you
don't specify one, the default will be used. If you specify a collation, it
automatically selects the character set to which it belongs. I could be
wrong about some or all of this.

@arnehormann
Copy link
Member

We could fix this with an exported var in the driver:
var Collation uint8 = utf8_general_ci
if we use that in the handshake, it can be set for all new connections if the driver is imported directly (like for the LOAD DATA LOCAL INFILE support).
On the plus side, it won't change anything for regular users not using this feature.

But it will need some documentation to clarify that setting it to a different collation does not change existing connections (which is hard to reason about because of database/sql's pooling). It would only really make sense to set it before first calling Open. And that you get the supported collations with SHOW collation (column id). And that there's no reasonable way to support error handling if an unavailable value is set. And that reading the variable is useless - it's not changed when a collation is unavailable and may not match the collation used in a specific connection.

I think that's viable and desirable. What do you think, Julien?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants