Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make utf8mb4_unicode_ci default collation for new mysql tables #1875

Merged
merged 2 commits into from
Sep 30, 2020
Merged

make utf8mb4_unicode_ci default collation for new mysql tables #1875

merged 2 commits into from
Sep 30, 2020

Conversation

MasterOdin
Copy link
Member

@MasterOdin MasterOdin commented Sep 1, 2020

closes #1763

This makes utf8mb4_general_ci the default collation for MySQL tables over utf8_general_ci. The prior default made sense 7+ years ago when computing power for servers was more limited, but nowadays, the performance gained is increasingly minimal over the necessary deal of handling the greater nuances of capitalization of different languages and their characters by default that the original utf8 schema lacked for MySQL.

@MasterOdin
Copy link
Member Author

Something I failed to think about within the original PR is that for a primary key within InnoDB for MySQL, the max number of bytes for 5.7 is 255 chars for utf8 (which use 3 bytes per character) and 191 for utf8mb4. This makes transitioning a bit more difficult for the default case of something like:

        $table = $this->table('table1', ['id' => false, 'primary_key' => ['column1']]);

        $table->addColumn('column1', 'string')
            ->addColumn('column2', 'integer')
            ->create();

Should the default of string be made 191 instead of 255? Only if it's the primary key? Require explicitly setting the length for primary key strings?

@MasterOdin
Copy link
Member Author

For right now, I'm leaving this open, but not necessarily coming back to it for a period as I would like to re-examine how other migration software handle default limits for string / varchar type, especially as it might concern to primary keys.

@dereuromark
Copy link
Member

Just use

charset: utf8mb4
collation: utf8mb4_unicode_ci

as commented above and we can merge this

@MasterOdin MasterOdin changed the title make utf8mb4_general_ci default collation for new mysql tables make utf8mb4_unicode_ci default collation for new mysql tables Sep 30, 2020
@MasterOdin
Copy link
Member Author

@dereuromark done. I decided to just insert a note in the documentation about having to explicitly set the primary key length for MySQL 5.7 and below when using the string type with utf8mb4_unicode_ci, instead of adjusting the default limit, as I view that as being a much larger BC break as it would probably mean changing the default for all adapters and such to keep them equal. Hopefully 5.7 is fazed out at a decent clip for 8.0+ such that this isn't an issue for too long.

Said note should probably be replicated into the changelog notes as well on release of 0.13 just to increasingly drive awareness of that fact.

@dereuromark dereuromark merged commit 6ef9016 into cakephp:0.next Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants