Ability to define connection and table creation charset and collation #1007
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Given these premises:
utf8
is an alias ofutf8mb3
utf8mb3
has been deprecated in MySQL 8 and will be removed in future releasesutf8mb4
was introduced in MySQL 5.5 (2010)utf8
charset andutf8_bin
collation in Q2AI get to these conclusions:
utf8mb4
should be the default charset to use by Q2AI'm not sure why but there are a few hardcoded collations that are used only when matching tags. This doesn't make much sense on its own, because there wouldn't be any difference between a tag or any other word used in a title or post content. It should be all or none, but not just tags.
Now, between all or none, it should be none. Hardcoding collations would not allow for customization. Whether
papa
,papá
orPAPA
should be the same tag or not, or all of them be considered the same when searching, should be up to how the database was setup (with/without accents or with/without case-sensitiveness).However, Q2A should give the option to allow for this to be configured.
qa-config.php
seems to be the most logical place. This would mainly be relevant during installation time and when the client connects. Changing the charset/collation of a database is a different story and require to change the structure of the database.Just to close the idea, tables created by plugin developers should have the chance to query the charset and collation as well to create their own tables.
Next steps:
utf8_bin
column which ispasshash
tobinary
. Clearly, a delicate move:question2answer/qa-include/db/install.php
Line 110 in bc1a8bc
question2answer/qa-include/util/string.php
Line 602 in bc1a8bc
The next steps are not taken into account in ths PR as they are waiting for feedback.
Some references to the deprecations in MySQL:
https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-utf8.html#:~:text=utf8%20has%20been%20used%20by%20MySQL%20in%20the%20past%20as%20an%20alias%20for%20the%20utf8mb3%20character%20set%2C%20but%20this%20usage%20is%20now%20deprecated
https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-utf8mb3.html#:~:text=The%20utf8mb3%20character%20set%20is,lifetimes%20of%20the%20MySQL%208.0.
https://dev.mysql.com/blog-archive/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/#:~:text=MySQL%205.5%20(2010)%20added%20support%20for%20up%20to%204%20byte%20utf8%20using%20the%20new%20utf8mb4%20character%20set.