Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to define connection and table creation charset and collation #1007

Open
wants to merge 2 commits into
base: bugfix
Choose a base branch
from

Conversation

pupi1985
Copy link
Contributor

Given these premises:

  • utf8 is an alias of utf8mb3
  • utf8mb3 has been deprecated in MySQL 8 and will be removed in future releases
  • utf8mb4 was introduced in MySQL 5.5 (2010)
  • Currently supported MySQL version is 5.0, according to Q2A documentation
  • There are explicit references to utf8 charset and utf8_bin collation in Q2A

I get to these conclusions:

  • utf8mb4 should be the default charset to use by Q2A
  • Collations should not be hardcoded in Q2A but rather configured by each user
  • Documentation should be updated to clarify MySQL 5.5 is the minimum supported version

I'm not sure why but there are a few hardcoded collations that are used only when matching tags. This doesn't make much sense on its own, because there wouldn't be any difference between a tag or any other word used in a title or post content. It should be all or none, but not just tags.

Now, between all or none, it should be none. Hardcoding collations would not allow for customization. Whether papa, papá or PAPA should be the same tag or not, or all of them be considered the same when searching, should be up to how the database was setup (with/without accents or with/without case-sensitiveness).

However, Q2A should give the option to allow for this to be configured. qa-config.php seems to be the most logical place. This would mainly be relevant during installation time and when the client connects. Changing the charset/collation of a database is a different story and require to change the structure of the database.

Just to close the idea, tables created by plugin developers should have the chance to query the charset and collation as well to create their own tables.

Next steps:

The next steps are not taken into account in ths PR as they are waiting for feedback.

Some references to the deprecations in MySQL:

https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-utf8.html#:~:text=utf8%20has%20been%20used%20by%20MySQL%20in%20the%20past%20as%20an%20alias%20for%20the%20utf8mb3%20character%20set%2C%20but%20this%20usage%20is%20now%20deprecated

utf8 has been used by MySQL in the past as an alias for the utf8mb3 character set, but this usage is now deprecated

https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-utf8mb3.html#:~:text=The%20utf8mb3%20character%20set%20is,lifetimes%20of%20the%20MySQL%208.0.

The utf8mb3 character set is deprecated. utf8mb3 remains supported for the lifetimes of the MySQL 8.0.x and MySQL 8.4.x LTS release series.

Expect utf8mb3 to be removed in a future major release of MySQL.

https://dev.mysql.com/blog-archive/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/#:~:text=MySQL%205.5%20(2010)%20added%20support%20for%20up%20to%204%20byte%20utf8%20using%20the%20new%20utf8mb4%20character%20set.

MySQL 5.5 (2010) added support for up to 4 byte utf8 using the new utf8mb4 character set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant