-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL case sensitivty fix #28651
MySQL case sensitivty fix #28651
Conversation
Remove the (undocumented) `Database.MysqlCharset` setting, and introduce `Database.DefaultCharset` and `Database.DefaultCollation` settings instead. The reason for the `MysqlCharset` removal is that in a later patch, we're going to adjust the engine initialization so that it achieves similar results, in a different way. The new settings will remain undocumented for similar reasons `MysqlCharset` was undocumented: the defaults work for the vast majority of cases, and it's too easy to break things if changing them. Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
@wxiaoguang I invited you as a collaborator to my repo, so you can edit this PR as you want, as promised in #28633. Feel free to add new commits, rebase, change, whatever. If I disagree, I'll comment, and we'll figure out how to make everyone happy. |
Thanks. I will try in a few days. And keep this comment as dummy (to record changes & screenshots) (added "refactoring" label, maybe this is the most suitable one in the list, it mainly refactors the database tables ....) Brief ideas (for reference)
Changes (TODO)... Screenshots (TODO)... Update: for reference, I would not be able to work if there are negative voices from Forgejo. |
By |
Sweet, that means I can get rid of the auto-detection. Will update the PR accordingly in a bit. Thanks! EDIT: Updated, |
Replace `db.ConvertUtf8ToUtf8mb4()` with `db.ConvertCharsetAndCollation()`, which does essentially the same thing, but the charset and the collation is controlled by the caller. Also introduce `db.findCaseSensitiveCollation()` which attempts to auto-detect the best supported case-sensitive collation for MySQL and MariaDB databases. With these two functions, `gitea doctor check` can be adapted to use the charset and collation set in the configuration (or auto-detect them, if unset), and this patch does that too. Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
What would need to be done to remove the WIP status of this PR? |
Pressing a button (done). It was only set to WIP so @wxiaoguang can do a review, but since that's not going to happen, and I consider this ready for a review, I've flipped it back. |
I will write a new one from scratch, because I believe it needs to address all the problems in #28651 (comment) |
It already addresses those. Well, most of them. An empty I understand you have a beef with Forgejo people, but I am not Forgejo people. For all practical purposes, I'm just a random guy who happens to contribute there too. (And the reason I do that, and why this change was submitted to Forgejo first, is because my personal policy is that I do not contribute to projects with a CLA, unless I am paid to do so - which in this case, I am.) |
I will submit my proposal,
This my also personal decision, personally I would like to keep a distance from Forgejo (under and related to Codeberg) because there is a long history. Again, disclaimer, that's just my personal decision and option, it doesn't mean the Gitea community's and personally I don't quite know Gitea community's. And off-topic: I really appreciate your work and I think you have shown very experienced programming skills and open social attitudes 🙏 |
There is no CLA, we use DCO. https://github.com/go-gitea/gitea/blob/main/DCO |
You do require copyright assignment. Effectively the same problem I have with CLAs. But this is not relevant to this PR anymore, I only mentioned it as an explanation, and that side-discussion has concluded. |
When starting the web server, perform a sanity check on the database. The sanity check currently runs for MySQL/MariaDB only, and verifies that the charset and collation are correctly set on both the database, and on all tables in it. If there is a discrepancy, it does not error out, but prints a warning only. Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
When initializing an empty MySQL/MariaDB database, ensure that the character set and collation is set to the desired values. With these set, creating a table will inherit these settings, unless table creation specifies a different charset or collate function. This is the reason why `setting.Database.MysqlCharset` was removed: that forced table creation to explicitly set a charset, rather than inherit the database's default, and in doing so, also changed the collate function to the charset's default (which may - and usually is - different from the one we want). Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
For MySQL, suggest not setting a collate function, to let Gitea deal with it. But also mention some of the options, would one want to set it up in advance and not rely on Gitea. Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
-> Recommend/convert to use case-sensitive collation for MySQL/MSSQL #28662 And the case-sensitive branch test passes all databases including MSSQL. |
This is a... complicated pull request, and does a number of things:
setting.Database.DefaultCharset
andsetting.Database.DefaultCollation
, and removessetting.Database.MysqlCharset
.db.ConvertUtf8ToUtf8mb4()
intodb.ConvertCharsetAndCollation()
, a function that can convert a database to the desired charset + collation, rather than hardcoding either of them.db.findCaseSensitiveCollation()
to auto-detect the most fitting collate, anddb.GetDesiredCharsetAndCollation()
to return either the charset/collation set in the config, or (if either is empty) the auto-detected default.gitea doctor convert
will convert the db to the desired charset + collation combo, rather than hardcoded values.InitDBWithMigrations
, if it sees an empty MySQL database, it willALTER DATABASE
it to have the correct charset & collate.gitea doctor convert
more strongly.The reason for the
setting.Database.MysqlCharset
removal, and theInitDBWithMigrations
change is to letCREATE TABLE
calls inherit the charset and the collate function from the database. If we don't removeMysqlCharset
from the connection string,xorm
will addCHARSET <foo>
to eachCREATE TABLE
call, which in turn will implicitly set the collate function to the charset's default, rather than the database's. If we remove the charset from the connection string,xorm
will not addCHARSET <foo>
to theCREATE TABLE
calls, and they will correctly inherit both charset & collate.A better solution for this would be to teach
xorm
to let us set a database-level default collate, and add that to eachCREATE TABLE
call. If it did that, we could remove theALTER TABLE
fromInitDBWithMigration
. This is something that is worth doing longer term, and I'm more than willing to do that and submit a PR againstxorm
, and once that's updated - if such a change is accepted - submit an update to Gitea aswell. Meanwhile, I think this is the best we can do.By the way, the
ALTER TABLE
thing (and thexorm
-based fix later on) will both help new installations when the database was automatically created with the wrong collate, such as when using a dockerized MySQL/MariaDB. Existing installs will have to usegitea doctor convert
.This PR does not address the problem for MSSQL, however. I don't have the resources to make a fix for that, but I'm reasonably sure something similar could be accomplished there, too.