-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change MySQL UTF-8 examples to use utf8mb4 #5100
Conversation
…andard most people would expect
|
||
If you are using MySQL, its `utf8` character set has some shortcomings | ||
which may cause problems. Prefer the `utf8mb4` character set instead, if | ||
your version supports it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will have to indented all these lines by four spaces to be rendered inside the enclosing sidebar block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I did want to have a caution-block inside the sidebar-block, but offhand I'm not sure if that's a thing that is done elsewhere in the documentation.
@DHager Thanks for your suggestion. Do you have some resources to which we can refer here? |
@xabbuh Sure. First, MySQL 5.5 docs on on the 10.1.10.6 The utf8mb4 Character Set:
In addition, other people have reported that the failure mode is complete string truncation, losing everything past the first problem-symbol:
|
Crap, an unrelated doc-fix seems to have become part of the pull-commit, I forgot Github automatically drew them in. I'll see if I can fix it. |
|
||
.. caution:: | ||
|
||
If you are using MySQL, its `utf8` character set actually only supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In restructeredText, you have to use double backticks.
OK, my local |
Hi @DHager! Thanks for bringing this up - I think it's a good note, especially since it causes silent issues (truncations). Since this character set is new to 5.5.3, I think we should: A) Add 2 new (commented out) lines in the What do you think? If you agree, can you make these changes? Thanks! |
I would personally not add the commented lines but reword a bit the caution section adding a link to the official MySQL docs. MySQL 5.5.3 was launched 5 years ago and I think it is good that this setting becomes more popular as many people (myself included until I read this PR) still believe utf8 is the way to go Plus, I think we should take the opportunity to review all the Symfony and Silex docs regarding this. A good opportunity to get easy doc badges as well :) Just my 2 cents @weaverryan and @DHager :) |
@weaverryan, @ricardclau : I'm inclined to put
The reverse-case, using
I've "promoted" the caution block to another sentence, and added comments to the configuration sample. |
I am 100% with @DHager on this one, but of course up to you @weaverryan :) |
I think it's great now - you're right that the new one should be the default, and I like that you show the old one and have some (short) words explaining. I'll merge this shortly :). Thanks! |
@DHager do you want to add this to the Silex / Doctrine docs as well? I can have a look at them but since you opened this PR I think it is fair that you go for them if you have time |
Thanks again! And yes, I think adding this to Silex or Doctrine if they have similar notes will make great sense. Cheers! |
…ien Hager) This PR was submitted for the 2.6 branch but it was merged into the 2.3 branch instead (closes #5100). Discussion ---------- Change MySQL UTF-8 examples to use utf8mb4 You might think MySQL's `utf8` is the right choice, but it's actually got some problems handling certain character inputs. The later, corrected mode of `utf8mb4` has fewer surprises. Commits ------- 7d7d94e Rewrite utf8mb4 cautions, add comment into sample configuration 55874c4 Add backticks for code-styling e3c2fb6 Indenting caution block to nest it inside the sidebar 6406f22 Revert "Fix example name to avoid breaking collision with standard data-collectors" dfc5620 Revert "Add a cautionary note telling users where the "standard" data-collector names can be found." 216ae51 Add a cautionary note telling users where the "standard" data-collector names can be found. f0ced91 Fix example name to avoid breaking collision with standard data-collectors f9cae6c Change MySQL UTF-8 examples to use utf8mb4, which is closer to the standard most people would expect
Forgot to mention: Using I'd call this minor because if someone already has a database, those columns will already be in some other character set, and if they're creating a new one... they probably shouldn't be trying to index longer text fields in the first place. |
This PR was merged into the 1.2 branch. Discussion ---------- Changed Doctrine page to use utf8mb4 as sample MySQL's `utf8` character set is a little broken, and does not cover 4-byte UTF-8 characters. In most cases it will quietly truncate the string whenever it sees one, saving incomplete text data. In 5.5.3 they introduced `utf8mb4` to fix this inconsistency, and given that it's been 5 years, it's probably safe to encourage people to use it. If their MySQL installation is older, it should be easy for them to find the distinctive string and change it back to `utf8`, and for a new project. Additional details can be found in the equivalent [pull-request for Symfony-2](symfony/symfony-docs#5100). Commits ------- a20f8f6 Changed Doctrine page to use utf8mb4 as sample
You might think MySQL's
utf8
is the right choice, but it's actually got some problems handling certain character inputs. The later, corrected mode ofutf8mb4
has fewer surprises.