Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL/MariDB broken on default Debian installations #377

Closed
berlincount opened this issue Feb 12, 2017 · 33 comments
Closed

MySQL/MariDB broken on default Debian installations #377

berlincount opened this issue Feb 12, 2017 · 33 comments
Labels
bug Something is not working.
Milestone

Comments

@berlincount
Copy link

berlincount commented Feb 12, 2017

Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=848603 is affecting Hydra, but should be fixable in the application.

When creating database table indexes are created as VARCHAR(255) fields, which usually should be well within the limit of 1000 bytes with the default latin1 encoding used by previous MySQL version's storage formats.

Debian maintainers made the questionable choice of setting the default string encoding to utf8mb4 (for emoji comments in WordPress or whatever), effectively quadrupling the storage size from 255 to 255*4 = 1020, which exceeds the pre-5.7 key size limit of 1000 bytes already. The limit was lowered to 767 bytes as well:

If innodb_large_prefix is enabled (the default in MySQL 5.7.7), the index key prefix limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format. If innodb_large_prefix is disabled, the index key prefix limit is 767 bytes for tables of any row format.

innodb_large_prefix is deprecated in MySQL 5.7.7 and will be removed in a future release. innodb_large_prefix was introduced in MySQL 5.5 to disable large index key prefixes for compatibility with earlier versions of InnoDB that do not support large index key prefixes."

This results in the following hydra host error:

FATA[0000] Could not create policy schema: Could not migrate sql schema, applied 0 migrations: Error 1071: Specified key was too long; max key length is 767 bytes handling 1

For

CREATE TABLE IF NOT EXISTS ladon_policy (
        id           varchar(255) NOT NULL PRIMARY KEY,
        description  text NOT NULL,
        effect       text NOT NULL CHECK (effect='allow' OR effect='deny'),
        conditions       text NOT NULL
)

This is also discussed in https://stackoverflow.com/questions/6172798/mysql-varchar255-utf8-is-too-long-for-key-but-max-length-is-1000-bytes

Had Debian maintainers also opted to set innodb_large_prefix=1 in their MySQL/MariaDB configuration, this would be fine (will be the default in the future). The documentation could hint at this to help users.

What else can Hydra do? Set the charset for tables it creates to "latin1" .. which should be fine anyway.

e.g.:

CREATE TABLE IF NOT EXISTS ladon_policy (
        id           varchar(255) NOT NULL PRIMARY KEY,
        description  text NOT NULL,
        effect       text NOT NULL CHECK (effect='allow' OR effect='deny'),
        conditions       text NOT NULL
) CHARACTER SET latin1

should work.

I've gone with a changed db.opt for my hydra database to force the use of latin1. *sigh*

@aeneasr
Copy link
Member

aeneasr commented Feb 13, 2017

Is there an alternative to "latin1"? I'd like to keep utf8 because we store json data which is (usually) utf8 encoded

@aeneasr
Copy link
Member

aeneasr commented Feb 13, 2017

Is reducing the key size a possibility? uuid use 32 only anyways so if we quadrupel that (128) we're still ready for some long af keys

@aeneasr
Copy link
Member

aeneasr commented Feb 13, 2017

Another issue is that we need to alter the table's charset which may cause cruel conversion errors (at least that was the case a few years ago) which are really hard to recover from (LAMP nightmares). So I'd like to find a solution that is less prone to error.

@berlincount
Copy link
Author

well, the only issue is the primary key .. if that's not a VARCHAR(255) in actuality, it shouldn't be stored as such ...

if it's actually a UUID it might be worth looking at http://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/ - there are some serious performance improvements possible, but it goes for one hell of a migration (once) .. and the charset could be kept.

@aeneasr
Copy link
Member

aeneasr commented Feb 17, 2017

The issue is that the default id generation is uuid but it can be overriden by the user in some cases (e.g. clients), so we can't force uuid here, unfortunately.

@aeneasr
Copy link
Member

aeneasr commented Feb 17, 2017

But would reducing the key size resolve the issue described?

@berlincount
Copy link
Author

It should, if it's reduced below 767 effective bytes, making it VARCHAR(191) .. 191*4 = 764 ..

@aeneasr aeneasr added the bug Something is not working. label Feb 24, 2017
@aeneasr
Copy link
Member

aeneasr commented Feb 24, 2017

One difficulty is that we don't know the primary key constraint's name, as it was not forced during table construction:

CREATE TABLE IF NOT EXISTS hydra_warden_group (
	id      	varchar(255) NOT NULL PRIMARY KEY
)

I'm not sure if the index name is the same on mysql/mariadb/postgres across versions. So I actually don't know how to write a ALTER TABLE statement for this that works everywhere. Any ideas?

@berlincount
Copy link
Author

Hmm. Not really. Existing tables should also be fine, so no need to alter them. Don't know whether the DB layer can live with differences there, though ;)

CREATE for a new table on a system with issues will simply fail.

@aeneasr
Copy link
Member

aeneasr commented Feb 25, 2017

Yeah that makes sense. I don't think it would be wise to have different versions around. It might make sense to fix that for the 1.0.0 release and let people know who used prior versions that they need to upgrade the key length themselves.

@aeneasr aeneasr modified the milestones: unplanned, 1.0.0: stable release Jun 5, 2017
@aeneasr
Copy link
Member

aeneasr commented Jun 5, 2017

Creating and updating database schemas is now a separate command (hydra migrate dsn://tosql), does this help here?

@aeneasr aeneasr modified the milestones: unplanned, 1.0.0: stable release Jun 5, 2017
@aeneasr aeneasr added the other label Jun 5, 2017
@DallanQ
Copy link
Contributor

DallanQ commented Oct 31, 2017

I just ran into this as well while trying to initialize a new hydra db. In order to enable large key support by default, you need to set:
innodb_large_prefix=on
innodb_file_format=Barracuda
innodb_file_per_table=on
innodb_default_row_format=dynamic

Unfortunately, innodb_default_row_format isn't available as an option until MySQL 5.7, and RDS Aurora (which is what I'm using) only supports MySQL 5.6, so I can't enable large key support by default on RDS Aurora.

There are several options:

  1. alter the database to use latin1 as the default character set
  2. reduce the key sizes to 191 characters
  3. modify the SQL create statements to append "row_format=dynamic" for MySQL. The dynamic row format is available since 5.5 I think, but it isn't the default
  4. require MySQL 5.7 and tell people to set all four settings above

To get started I'm going with option 1, but option 2 would be really nice if the keys don't really need to be over 191 characters.

I'd be happy to submit a PR to reduce the VARCHAR(255) statements to VARCHAR(191), but I don't know if that would break anything.

@aeneasr
Copy link
Member

aeneasr commented Oct 31, 2017

Yeah this should definitely be addressed in the 0.10.0 branch. I think VARCHAR(128) should already be more than enough. The best thing would be to write a migration set for using that varchar length and I gladly accept a PR for that. Keep in mind that it might become a bit tricky when it comes to updating the keys as well.

@DallanQ
Copy link
Contributor

DallanQ commented Oct 31, 2017

Ok, I'll put something together next week.

@aeneasr
Copy link
Member

aeneasr commented Nov 5, 2017

An alternative would be to move the current primary key to a surrogate key with a unique index and use a (maybe unqueriable) auto-increment INT primary key. That way we could even allow larger keys than 255 for the surrogate key whilst being nice to ops and generally more interoperable. What do you think?

@aeneasr
Copy link
Member

aeneasr commented Nov 5, 2017

I want to quickly re-iterate my reasoning here:

  1. By using a SERIAL (or auto_increment) we'll be using INTs for the primary key which will in turn gain some significant performance improvements over large JOINs (the case for policies)
  2. By using the surrogate ID for querying through the public API it's still possible to allow large ID names, fast lookup and unguessability (in case of using hashes for IDs)

@aeneasr
Copy link
Member

aeneasr commented Nov 5, 2017

Here's a quick PR I drafted out for showing you what I mean by that exactly: #642

Please note that this is only PostgreSQL so far but it should be equally easy for mysql.

@ruhavingfun22
Copy link

I hit this too over the weekend. Doesn't work on mySQL5.6, but works fine on mySQL5.7 I looked over your change and think that is a neat way to get around the size problem without having to reduce the length of the keys. Would be nice to mention somewhere this limitation until the fix is committed.

@aeneasr
Copy link
Member

aeneasr commented Nov 6, 2017

I agree that this should be addressed, I'm currently preoccupied with some other things but I definitely want this to land in 0.10.0

@aeneasr aeneasr modified the milestones: unplanned, 0.10.0 Nov 6, 2017
@DallanQ
Copy link
Contributor

DallanQ commented Nov 6, 2017

If I'm reading the PR correctly, it looks like this line
ADD CONSTRAINT hydra_client_surrogate_id_unique UNIQUE (surrogate_id)
would still require that the varchar(255) column to have a unique key. Unfortunately it's not just primary keys that can't be more than 767 bytes, but no key can be more than 767 bytes.

@aeneasr
Copy link
Member

aeneasr commented Nov 6, 2017

Oh snap, that is truly unfortunate. We certainly shouldn't remove indices from the surrogate_id since that will be queried often. Are there any other options?

@aeneasr
Copy link
Member

aeneasr commented Nov 6, 2017

I think we could live with going for 191 length surrogate_id (I became quite fond of the idea of using a surrogate id!) here.

@DallanQ
Copy link
Contributor

DallanQ commented Nov 8, 2017

If you think we can live with 191 length for surrogate_id, that's a pretty straightforward migration. I can build on your PR and alter the tables to make surrogate_id varchar(191) if you like. What do you think?

@aeneasr
Copy link
Member

aeneasr commented Nov 8, 2017

I would love that! Before you start, how would you migrate FOREIGN KEY relations? We have two modules using those:

I think that, if you solve the migration for the first module, the second module will work the same way. My idea so far was to (table A being the table, that table B has a foreign relation to):

  1. Drop the foreign key from B
  2. Move id to surrogate_id in A
  3. Alter surrogate_id to VARCHAR(191) in A
  4. Create a new primary key id SERIAL (postgres) and id INT AUTO_INCREMENT in A
  5. Move the foreign key column in B to <fk_column>_surrogate_id and change to VARCHAR(191)
  6. Create a new column <fk_column>_id with INT
  7. Create a more complex query that does roughly:
    1. SELECT A.id FROM A WHERE A.surrogate_id = <the surrogate value from column B>
    2. UPDATE B SET (id= <select value from above>
  8. Re-add the foreign keys
  9. Change the VARCHAR lengths to 191 in previous migrations as they will only affect new systems
  10. Write a big fucking warning in HISTORY.md :D

@DallanQ
Copy link
Contributor

DallanQ commented Nov 14, 2017

I reviewed all of the table keys and constraints that need to be modified:

KEYS
hydra_client.id(255)
hydra_client_migration.id(255)
hydra_consent_request_migration.id(255)
hydra_groups_migration.id(255)
hydra_jwk.sid(255) + kid(255)
hydra_jwk_migration.id(255)
hydra_oauth2_access.signature(255)
hydra_oauth2_code.signature(255)
hydra_oauth2_migration.id(255)
hydra_oauth2_oidc.signature(255)
hydra_oauth2_refresh.signature(255)
hydra_policy_migration.id(255)
hydra_warden_group.id(255)
hydra_warden_group_member.member(255) + group_id(255)
ladon_action.compiled(512)
ladon_action.template(512)
ladon_policy.id(255)
ladon_policy_action_rel.policy(255) + action(64)
ladon_policy_permission.policy(255)
ladon_policy_resource.policy(255)
ladon_policy_resource_rel.policy(255) + resource(64)
ladon_policy_subject.policy(255)
ladon_policy_subject_rel.policy(255) + subject(64)
ladon_resource.compiled(512)
ladon_resource.template(512)
ladon_subject.compiled(512)
ladon_subject.template(512)

CONSTRAINTS
hydra_warden_group_member.group_id -> hydra_warden_group.id
ladon_policy_action_rel.policy -> ladon_policy.id
ladon_policy_action_rel.action -> ladon_action.id
ladon_policy_permission.policy -> ladon_policy.id
ladon_policy_resource.policy -> ladon_policy.id
ladon_policy_resource_rel.policy -> ladon_policy.id
ladon_policy_resource_rel.resource -> ladon_resource.id
ladon_policy_subject.policy -> ladon_policy.id
ladon_policy_subject_rel.policy -> ladon_policy.id
ladon_policy_subject_rel.subject -> ladon_subject.id

I'm starting to have concerns about whether shortening the key lengths will work, because we're not just shortening id to 191. Some keys have to be shortened to 95. I don't know the code well enough to say whether shortening has the possibility of truncating keys. Here what would need to happen:

  • all id fields: 255 -> 191
  • hydra_oauth2_access, hydra_oauth2_code, hydra_oauth2_oidc, hydra_oauth2_refresh tables: signature 255 -> 191
  • hydra_jwk table: sid 255 -> 95, kid 255 -> 95
  • hydra_warden_group_member table: member 255 -> 95, group_id 255 -> 95
  • ladon_action, ladon_resource, ladon_subject tables: compiled 512 -> 191, template 512 -> 191
  • ladon_policy_action_rel table, ladon_policy_permission, ladon_policy_resource, ladon_policy_resource_rel, ladon_subject, ladon_subject_rel tables: policy 255 -> 127

Note that in the hydra_jwk table, the sid and kid fields would have to be reduced to 95 each since the key includes both fields, similarly in the hydra_warden_group_member table, the member and group_id fields would have to be reduced to 95 each. The compiled and template fields in three tables would have to be reduced from 512 characters to 191. And the policy field would have to be reduced to 127 since the keys include policy and another 64-character field.

Is this reducing the fields too much? If not, I will write a PR to shorten the keys in this way, along with renaming id to surrogate_id, introducing a new id field, and dropping and re-adding the foreign-key constraints.

Other possible approaches:

  • use latin1 as the default character set for mysql. Do the tables need to store multi-byte unicode characters?
  • specify only fields involved in keys be latin1 in mysql, but allow other fields to be utf8mb4.
  • append "row_format=dynamic" to the create table statements for mysql, tell people to use mysql version >= 5.5, and to set innodb_large_prefix=on, innodb_file_format=Barracuda, and innodb_file_per_table=on.

Which is the best approach?

@aeneasr
Copy link
Member

aeneasr commented Nov 15, 2017

Thank you for the thorough investigation! Reducing some of those keys that drastically is indeed a bit tricky, especially with the compiled fields in ladon_action/ladon_resource etc. Regarding some of the others:

hydra_jwk table: sid 255 -> 95, kid 255 -> 95

We could split those into two columns, right?

hydra_warden_group_member table: member 255 -> 95, group_id 255 -> 95

Is this due to the composed primary key?


One question I have though is, we're using MySQL 5.7 from library/docker for integration tests - why aren't those catching these issues?

@DallanQ
Copy link
Contributor

DallanQ commented Nov 16, 2017

The hydra_jwk table has a combined key: sid + kid. If we split this into two separate (non-unique) keys, then each key could be 191 bytes long. Does the database have to enforce that sid + kid be unique? If not, then creating two separate non-unique keys is the way to go.

A similar question applies to the hydra_warden_group_member table, which currently has a combined key of member + group_id. If the database doesn't have to enforce that member + group_id be unique, then we could create two separate keys here and allow member and group_id to be 191 bytes long.

The reason that Hydra works in MySQL 5.7 is that the four fields that I mentioned all have the required values as their default values in MySQL 5.7. Unfortunately the innodb_default_row_format option wasn't added until MySQL 5.7. Before 5.7, you have to add "row_format=dynamic" to the create table statements.

Is it possible to construct different create table statements for MySQL vs Postgres? If so, then appending "row_format=dynamic" to MySQL create-table statements is an easy way to enable large-key support in MySQL 5.5 and 5.6 (along with setting the other three values). This would avoid having to reduce the keys sizes.

@aeneasr
Copy link
Member

aeneasr commented Nov 16, 2017

Is it possible to construct different create table statements for MySQL vs Postgres? If so, then appending "row_format=dynamic" to MySQL create-table statements is an easy way to enable large-key support in MySQL 5.5 and 5.6 (along with setting the other three values). This would avoid having to reduce the keys sizes.

Yes that is possible, it would also probably be the cleanest solution for now.

@DallanQ
Copy link
Contributor

DallanQ commented Nov 16, 2017

Ok, what's the best way to do this? I see CreateSchemas functions in client/manager_sql, jwk_manager_sql, oauth2/consent_manager_sql and fosite_store_sql, and warden/group/manager_sql. I could make a function: customize(migrations, s.DB.DriverName()) that would customize the migrations depending upon the driver name, which would append "row_format=dynamic" if the driver looked like a mysql driver. What do you think?

The same thing would have to be done for the ladon project as well, since it also creates tables.

@aeneasr
Copy link
Member

aeneasr commented Nov 16, 2017

Yes that looks good. In ladon, we already separate between postgres and mysql so you would simply append the row_format there. For the other cases where the same schema is used it would be ok to have a helper function such as enhanceSqlSchema(schema string, driverName string) which would wrap the first up statement.

@DallanQ
Copy link
Contributor

DallanQ commented Nov 17, 2017

Ok, I'll draft a PR this weekend for your approval.

@aeneasr
Copy link
Member

aeneasr commented Nov 17, 2017

Awesome!

@aeneasr aeneasr modified the milestones: 0.10.0, 1.0.0-alpha1 Dec 8, 2017
@aeneasr aeneasr modified the milestones: 1.0.0-alpha1, unplanned Jan 15, 2018
@aeneasr
Copy link
Member

aeneasr commented Aug 21, 2018

I'm closing this because it was added to the limitations section of the docs. It's unlikely that we'll change this. We don't have extensive foreign key relations which means that shortening private keys doesn't make a lot of sense (apart from this issue), as we always query the long ID. I think all cloud providers are on 5.7+ anyways today and there is a workaround for older installations which is documented in this issue.

@aeneasr aeneasr closed this as completed Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

No branches or pull requests

4 participants