-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change MySQL defaults from broken utf8 to fixed utf8mb4 #851
Conversation
Hello, thank you for creating this pull request. I have automatically opened an issue http://www.doctrine-project.org/jira/browse/DBAL-1224 We use Jira to track the state of pull requests and the versions they got |
This seems reasonable to me, but only if an example string in a functional test is also being provided (in order to prove the validity of the change), especially around the data truncation you reported in the description |
Doctrine is already parsing the database version in order to create Platform objects. I would preffer a condition for this, like Nette did nette/http#28 + nette/database@7988663 |
@Ocramius: I don't think a test is worthwhile in this case, because there's no new behavior of Doctrine to test, nor any particular way to "solve" or react the issue in Doctrine. (Yet?) We'd just be writing tests to assert how third-party non-PHP code behaves. Here's a simple way to validate the issue, with thanks to @mathiasbynens for the example problem-character:
Turning on MySQLs "strict" mode should promote the warning to an error, but that's not the default and people would still need to go through the effort of changing their tables if they need to support the problem-characters. |
@fprochazka: I notice there's a Alternately, it might be possible to initialize the Platform to a charset/collation default based on what we can glean from the database connection (i.e. with |
A change is only justified by a test here, IMO: we show why this default is You are not really testing the functionality, you are just hardening Marco Pivetta On 5 May 2015 at 03:56, Darien notifications@github.com wrote:
|
@fprochazka: I've got a side-branch that uses a |
Ah, yes, now the build-server is telling me something I should've remembered from my own work: It seems like there are a few different choices, turning on MySQL-specific behavior:
|
@Ocramius, @fprochazka : At least for my immediate use-case, I think I've found a better solution with doctrine/DoctrineBundle#420 instead. |
Reopening as it is worth discussing the upgrade. |
As a follow up on the post about This is because the maximum size of an InnoDB index is 767 bytes. Thus in 3 byte utf8 a varchar type field can be at max 255 chars long (255 * 3 -> 765), while in 4 byte utf8 it can only be 191 chars long (191 * 4 -> 764). Starting with MySQL 5.6.3 you can circumvent this by configuring it with the setting Maybe these things should be considered to build a stable solution for this. What do you think @Ocramius? |
This one is rather tricky and considering the amount of risks I doubt we can easily change this in 2.x. Also I wonder IF we changed it somehow, how we can keep BC and consistency here. When users would start to upgrade they'd have new tables with charset |
Disagree with this, according to history (remember |
@Ocramius yeah what about it? Is it wrong to use |
So what can we do in 2.x go get at least an intermediate solution? I tend to prefer the idea of introducing a |
@deeky666 Even if all your customers in Germany, they might have a name that doesn't fit into latin1. Storing non-legacy data in anything but Unicode really shouldn't be encouraged. |
I am very much in favour of this, there is small backwards compatibility issue that is worth talking about. The default length of a This will cause the generated index SQL for many |
@mcfedr See #851 (comment) for details about this. |
any news on this? |
Just also wanted to switch "easily", but got trapped by this issue. Since there is already a Mysql57Platform...what about changing the default only there?(i know it's sadly not that easy for the mass) //EDIT Just register your custom platform in the config <?php
namespace YOUR_SPACE\Doctrine\DBAL\Platforms;
use Doctrine\DBAL\Platforms\MySQL57Platform;
/**
* Custom Mysql57 Platform, to use utf8mb4 per default
*/
class MySQL57PlatformCustom extends MySQL57Platform
{
protected function _getCreateTableSQL($tableName, array $columns, array $options = array())
{
// Charset
if (! isset($options['charset'])) {
$options['charset'] = 'utf8mb4';
}
// Collate
if (! isset($options['collate'])) {
$options['collate'] = 'utf8mb4_unicode_ci';
}
return parent::_getCreateTableSQL($tableName, $columns, $options);
}
} |
Switch to utf8mb4 instead of utf8 because f*** MySQL See doctrine/dbal#851
Switch to utf8mb4 instead of utf8 because f*** MySQL See doctrine/dbal#851
Switch to utf8mb4 instead of utf8 because f*** MySQL See doctrine/dbal#851
Switch to utf8mb4 instead of utf8 because f*** MySQL See doctrine/dbal#851
@ThaDafinser thank you for that piece of code, that's perfect! And for those using MariaDB you will need a version > 10.2. |
FWIW, this would be an awesome addition if it wasn't for mysql's awful index size limits :-\ If we provide it out of the box, then the varchar size should go down by default. |
@Ocramius Or generate index declarations with size specified, although that might be another can of worms |
Also adding to this conversation the 'correct' collate to use with |
@mcfedr do you have a source for the |
Unfortunately there isn't good information on the MySQL docs, but the best source I have found is the Wordpress bug tracker: https://core.trac.wordpress.org/ticket/32105#comment:3 Wordpress recently changed to using utf8mb4_unicode_520_ci by default. The key reason
|
What's the status on this 'defaults' merge, and utf8mb4 support in general? Currently, in a new Symfony4 project
returns
also noting
the current config includes,
|
You seem to be mixing the two, might need to change your table charset to
utf8mb
…On Thu, 25 Jan 2018, 18:30 pgnd, ***@***.***> wrote:
What's the status on this 'defaults' merge, and utf8mb4 *support* in
general?
Currently, in a new Symfony4 project
bin/console doctrine:phpcr:repository:init
returns
Successfully registered system node types.
Executing initializer: CmfRoutingBundle
In ObjectManager.php line 859:
Error inside the transport layer: An exception occurred while executing 'SELECT id FROM phpcr_nodes WHERE path COLLATE utf8mb4_bin = ? AND workspace_name = ?' with params ["\/", "default"]:
SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'
In AbstractMySQLDriver.php line 121:
An exception occurred while executing 'SELECT id FROM phpcr_nodes WHERE path COLLATE utf8mb4_bin = ? AND workspace_name = ?' with params ["\/", "default"]:
SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'
In PDOStatement.php line 107:
SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'
In PDOStatement.php line 105:
SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'
also noting
vendor/doctrine/dbal/UPGRADE.md
## Creating MySQL Tables now defaults to UTF-8
If you are creating a new MySQL Table through the Doctrine API, charset/collate are now set to 'utf8'/'utf8_unicode_ci' by default. Previously the MySQL server defaults were used.
the current config includes,
config/packages/doctrine.yaml
...
doctrine:
dbal:
default_connection: dev
connections:
dev:
...
driver: 'pdo_mysql'
server_version: '5.7'
charset: utf8mb4
default_table_options:
charset: utf8mb4
collate: utf8mb4_unicode_ci
phpcr_dev:
...
driver: 'pdo_mysql'
server_version: '5.7'
charset: utf8mb4
default_table_options:
charset: utf8mb4
collate: utf8mb4_unicode_ci
...
doctrine_phpcr:
session:
backend:
type: doctrinedbal
connection: phpcr_dev
...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#851 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq_ZP2m94k21ClWY2UFMtzLnakNwIpFks5tOKxCgaJpZM4EPuUz>
.
|
@mcfedr
consistent with MariaDB defaults ... |
Hard to say, the error message shows that the column or table is set to use
utf8, you need to look at the generated `create` queries to work out where
you've gone wrong
…On Thu, 25 Jan 2018, 19:12 pgnd, ***@***.***> wrote:
@mcfedr <https://github.com/mcfedr>
mixing how/where?
both DBs have
charset: utf8mb4
default_table_options:
charset: utf8mb4
collate: utf8mb4_unicode_ci
consistent with MariaDB defaults ...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#851 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq_ZFV22OOHtaFRhBUhKZhbg4PCrRcuks5tOLX_gaJpZM4EPuUz>
.
|
@mcfedr EDIT: I think my issue better belongs here: |
The change is obsolete as of #4644. |
This is a conservative echo of #317 .
Essentially MySQL's
uft8
character set is broken and does not support full UTF-8, and a better alternative,utf8mb4
has existed for about five years now. When a 4-byte UTF-8 character comes in for autf8
table, by default MySQL will truncate the string and log a warning.Insofar as Doctrine has any default configuration values for MySQL, I think
utf8mb4
is a better, safer choice. If anybody is running MySQL <5.5.3 and has problems, they should have a fairly easy time figuring out what's going wrong due to how distinctive the string is. (The other way around, searching forutf8
, is not.)