Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table is not configured with utf8mb4, after upgrade #3959

Closed
julpec opened this issue Apr 29, 2020 · 10 comments
Closed

Table is not configured with utf8mb4, after upgrade #3959

julpec opened this issue Apr 29, 2020 · 10 comments
Assignees

Comments

@julpec
Copy link

julpec commented Apr 29, 2020

Hello
I am in a migration situation of my Etherpad instance. I migrated my database and after that I updated Etherpad. I have this coming out in the logs now:

[2020-04-29 10:43:29.440] [ERROR] console - table is not configured with charset utf8mb4 -- This may lead to crashes when certain characters are pasted in pads

Don't know if it can cause errors…

@JohnMcLear
Copy link
Member

JohnMcLear commented Apr 29, 2020

So yeah, this is usually due to mis-configured databases/tables/schemes but it's new code and I want to make sure it's working properly because I'm not 100% sure it is.

It's actually an UeberDB issue but whatever, here is fine. The logic here: https://github.com/ether/ueberDB/blob/master/mysql_db.js#L81

I'm going to make the assumption your Etherpad database is called etherpad_lite_db. If it isn't, replace as fit.

Do the following commands:

mysql -uroot -p

use etherpad_lite_db

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'etherpad_lite_db';

Copy and paste the output (here is fine).

Next... Copy/paste the final command..

SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = 'etherpad_lite_db' AND T.table_name = 'store';

Copy and paste the output.

Your two outputs should be

+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4                    | utf8mb4_general_ci     |
+----------------------------+------------------------+

and

+--------------------+
| character_set_name |
+--------------------+
| utf8mb4            |
+--------------------+

If they are not, your database was not properly configured and certain characters will potentially break some pads on your instance. If you see latin in there at all, y'dun fucked up and you will need to run these commands (I think [not tested])...

ALTER DATABASE `etherpad_lite_db` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

and

ALTER TABLE `store` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

If they don't work lemme know.

@julpec
Copy link
Author

julpec commented Apr 29, 2020

Here is the output for the first command :

+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4                    | utf8mb4_general_ci     |
+----------------------------+------------------------+

Such was your wish

And this is for the second...

+--------------------+
| character_set_name |
+--------------------+
| utf8               |
+--------------------+

Oopss…
What can I do now ?

@julpec
Copy link
Author

julpec commented Apr 29, 2020

sorry, I went a bit fast without seeing the end of the message, I'm going to test the database modification now.

@julpec
Copy link
Author

julpec commented Apr 29, 2020

Great! It worked.
No more messages in the logs

@JohnMcLear
Copy link
Member

Win. It's great to see my work / efforts validated so thanks man! :)

@RalfJung
Copy link

RalfJung commented May 5, 2020

Thanks for these commands! One of my servers had "latin1" here.

However I certainly never told etherpad to use latin1, why is etherpad not able to properly configure its database itself? Is the expectation that every admin has to manually configure the character set of each database?

@JohnMcLear
Copy link
Member

Thanks for these commands! One of my servers had "latin1" here.

However I certainly never told etherpad to use latin1, why is etherpad not able to properly configure its database itself? Is the expectation that every admin has to manually configure the character set of each database?

  1. latin1 is the default in MySQL, still, in 2020. That's nothing to do with Etherpad!
  2. Etherpad is properly able to configure the database itself but we chose not to because we don't want to force utf8mb4 et al on admins as admins may prefer another charset. We're not DBAs.
  3. Admins have to create the database, so yea, when you create the database you copy/paste the alter command. This is only for MySQL, if you use another database it's not an issue. We can't handle every edge case of every deployment in every instance.. There is a limit to the scope of our work (which is to build a collaborative editor)
  4. Etherpad is not responsible here, UeberDB is. It's just this issue is here because Etherpad is the biggest user of UeberDB. We're in the process of modernizing UeberDB and this warning was step one. We can't force an alter because of point Fixing scripts to run from both bin/ and /. #2

I think our approach here was balanced and right, but I can understand if you feel a little frustrated that it's not click and go! <3

@RalfJung
Copy link

RalfJung commented May 5, 2020

Etherpad is properly able to configure the database itself but we chose not to because we don't want to force utf8mb4 et al on admins as admins may prefer another charset. We're not DBAs.

I'm the admin but I don't think I can decide which charset an application runs with -- does it even make sense to leave that choice to me? I would expect the wrong choice to just break the application. I am not deciding about the charset of the HTTP request/responses either, or the charset of the files stored in the file system.

So, it seems rather strange to me that it would be up to the admin to decide internal details of an application such as the charset used for the database. You're not leaving the database schema to me, either. ;)

latin1 is the default in MySQL, still, in 2020. That's nothing to do with Etherpad!

Oh I see. Well that's... sad, I guess? Seems like everyone got the charset problem solved these days by moving to utf8 except for mysql... :(

So for now the answer seems to be "yes, admins are expected to configure this themselves". I realize this is the situation in the mysql ecosystem and not your doing, I just wanted to give some feedback from the perspective of a user. Having an error about it in the logs is a great start, but I also don't scan the logs for errors regularly, there's just too much stuff in the logs.

Maybe one possible solution would be for etherpad to alter the database as appropriate by default, but have a config option to turn that off in case DBAs want to make a different choice? That would avoid the footgun -- it is only by pure chance that I even saw this issue, usually I would never have noticed. Maybe I am naive, but I expect the number of admins that want to override that choice to be tiny.

@JohnMcLear
Copy link
Member

Noted with thanks :)

@y377
Copy link

y377 commented Mar 13, 2022

Noted with thanks :) +1

@ether ether locked and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants