Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually run against the version of the DB we select in the matrix. #12591

Merged
merged 1 commit into from
Nov 25, 2020

Conversation

ashb
Copy link
Member

@ashb ashb commented Nov 24, 2020

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

I wouldn't be surprised if we now discover that we have some bugs on MySQL 8


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb added the full tests needed We need to run full set of tests for this PR to merge label Nov 24, 2020
@ashb ashb requested review from dimberman, kaxil and potiuk November 24, 2020 16:26
@ashb ashb added this to the Airflow 2.0.0-beta4 milestone Nov 24, 2020
@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

#12588 for instance.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

Who tests the tester ..... Happy to take a look and fix those tests. Good that you found it @ashb

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

| Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

I saw this before with indexes. It's likely tth e are rruning utf8mb4 charset in our tests it likely causes some columns too be too big. I wonder though why it does not fail for 5.7.

@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

Could this be related:

/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py:593 Warning: (3778, "'utf8_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead.")

@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

The "fix" may be to make extra a TEXT rather than varchar column, that's why Mysql seems to suggest in their docs.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

The "fix" may be to make extra a TEXT rather than varchar column, that's why Mysql seems to suggest in their docs.

Yep. The only thing that makes it really strange that it does not fail in MySQL 5.7

@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

The only thing that doesn't surprise me about MySQL versions is just how surprising each version is in it's behaviour :)

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

I start hating it for that reason too.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

Checking why

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

This table works fin on 5.7:

+--------------------+---------------+------+-----+---------+----------------+
| Field              | Type          | Null | Key | Default | Extra          |
+--------------------+---------------+------+-----+---------+----------------+
| id                 | int(11)       | NO   | PRI | NULL    | auto_increment |
| conn_id            | varchar(250)  | NO   | UNI | NULL    |                |
| conn_type          | varchar(500)  | NO   |     | NULL    |                |
| host               | varchar(500)  | YES  |     | NULL    |                |
| schema             | varchar(500)  | YES  |     | NULL    |                |
| login              | varchar(500)  | YES  |     | NULL    |                |
| password           | varchar(5000) | YES  |     | NULL    |                |
| port               | int(11)       | YES  |     | NULL    |                |
| extra              | varchar(5000) | YES  |     | NULL    |                |
| is_encrypted       | tinyint(1)    | YES  |     | NULL    |                |
| is_extra_encrypted | tinyint(1)    | YES  |     | NULL    |                |
| description        | varchar(5000) | YES  |     | NULL    |                |
+--------------------+---------------+------+-----+---------+----------------+

@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

I make that ~17k of length, which if we assume Mysql8 now uses 4 bytes per char pushses itjust over to 68k, where as at 3 bytes per it's 51k.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

I KNOW. It is a total 🤦

@kaxil
Copy link
Member

kaxil commented Nov 24, 2020

Well atleast we didn't find any issues with Postgres13.

For MySQL I think let's just change the column to TEXT

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

It's so bad on the MYSQL side, that I even do not know how to react .....

It seems that when you use Mysql8 Client and request utf8mb4 character, the database is created with 🤦 🤦 🤦 🤦 🤦 🤦 🤦 latin1 collation....

This is what happens in our case. When we changed client in the docker image to use mysql8 client, our 5.7 database silently started to be created as latin1 database instead of utf8mb4 and it worked fine with the increased size, because in latin1 the row size is much smaller.

Mysql 8 client's utf8mb4 is turned into utf8mb4_0900_ai_ci which is not known to mysql 5.7 and it falls back to default which is latin1 in MySQL5.7.

TRUE STORY.

https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html

Why does this occur? After all, utf8mb4 is known to the 8.0 client and the 5.7 server, so both of them recognize it. To understand this behavior, it is necessary to understand that when the client tells the server which character set it wants to use, it really tells the server the default collation for that character set. Therefore, the aforementioned behavior occurs due to a combination of factors:

 The default collation for utf8mb4 differs between MySQL 5.7 and 8.0 (utf8mb4_general_ci for 5.7, utf8mb4_0900_ai_ci for 8.0).

When the 8.0 client requests a character set of utf8mb4, what it sends to the server is the default 8.0 utf8mb4 collation; that is, the utf8mb4_0900_ai_ci.

utf8mb4_0900_ai_ci is implemented only as of MySQL 8.0, so the 5.7 server does not recognize it.

Because the 5.7 server does not recognize utf8mb4_0900_ai_ci, it cannot satisfy the client character set request, and falls back to its default character set and collation (latin1 and latin1_swedish_ci).

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

So even our tests for MySQL 5.7 are wrong!

@kaxil
Copy link
Member

kaxil commented Nov 24, 2020

https://media.giphy.com/media/gd09Y2Ptu7gsiPVUrv/giphy.gif

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

Why don't we say that we do not support utf8mb4 at all for mysql? Just utf8mb3?
I think this way we do not have to migrate anything and I think it will work just fine.

Seriously - with this problem in MySQL client, it is simply SAFER not to run utf8mb4 AT ALL

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

And Who needs it anyway?

@ashb
Copy link
Member Author

ashb commented Nov 24, 2020

I have no problem with limiting the possible combinations of mysql we support, so sure.

Slight correction: let's phrase it as some form of "PR's welcome" rather than a "we won't ever support this".

potiuk added a commit to PolideaInternal/airflow that referenced this pull request Nov 25, 2020
Due to not executing MySQL8 tests Fixed in apache#12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.

This change adds migration and fixes the previous migration
to make it compatible.
@potiuk
Copy link
Member

potiuk commented Nov 25, 2020

I did a thorough check - and I run a few tests and It is actually a bit better than that. The "fallback" behavior is still there.
But utf8mb4 should continue to work as long as we set it as default. We had it before, but it has been removed when we added MYSQL8, and I am restoring it now.

Still the clients's behavior remains very bad but at least we mitigate it in tests.

@ashb
Copy link
Member Author

ashb commented Nov 25, 2020

@potiuk Order to merge PRs is #12596, #12614 then #12615 right?

potiuk added a commit that referenced this pull request Nov 25, 2020
Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.

This change adds migration and fixes the previous migration
to make it compatible.
potiuk added a commit that referenced this pull request Nov 25, 2020
* Fix Connection.description migration for MySQL8

Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.

This change adds migration and fixes the previous migration
to make it compatible.

* Fixes inconsistent setting of encoding on Mysql 5.7/8

We missed that when we added support
for differnet mysql versions in #7717 when we removed default
character set setting for the database server.

This change forces the default on database server to be
utf8mb4 - regardless if MySQL 5.7 or MySQL8 is used.
Utf8mb4 is default for MySQL8 but latin1 is default fo MySQL 5.7.

There was a suspected root cause of the problem:

https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
where mysql client falls back to the default collation if
the client8 is used with 5.7 database, but this should be
no problem if the default DB character set is forced to be
utf8mb4

This PR restores forcing the server-side encoding.
@potiuk
Copy link
Member

potiuk commented Nov 25, 2020

@potiuk Order to merge PRs is #12596, #12614 then #12615 right?

Correct.

@potiuk
Copy link
Member

potiuk commented Nov 25, 2020

@ashb -> you can rebase this one now.

@potiuk
Copy link
Member

potiuk commented Nov 25, 2020

In orer to make sure that it works faster I rebased it and pushed in our repo -> #12621

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)
@ashb ashb force-pushed the actually-test-right-db-version branch from 07234dd to a86379d Compare November 25, 2020 16:34
@potiuk potiuk merged commit 54adda5 into apache:master Nov 25, 2020
@potiuk
Copy link
Member

potiuk commented Nov 25, 2020

All tests completed - just upload coverage to run.

potiuk pushed a commit that referenced this pull request Nov 26, 2020
…12591)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)
kaxil pushed a commit that referenced this pull request Nov 27, 2020
…12591)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Nov 27, 2020
…pache#12591)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)
(cherry picked from commit ca53e70)
potiuk pushed a commit to PolideaInternal/airflow that referenced this pull request Nov 27, 2020
…pache#12591)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)
potiuk added a commit that referenced this pull request Nov 27, 2020
…12591) (#12668)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@potiuk potiuk added the type:misc/internal Changelog: Misc changes that should appear in change log label Nov 30, 2020
@ashb ashb deleted the actually-test-right-db-version branch December 7, 2020 17:26
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
…pache#12591) (apache#12668)

Due to a bug in Breeze initialization code, we were always running
against Postgres 9.6 and MySQL 5.7, even when the matrix selected
something else.

(We were overwriting the POSTGRES_VERSION and MYSQL_VERSION environment
variables in initialization code)

(cherry picked from commit 54adda5)

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@potiuk potiuk mentioned this pull request Jul 20, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools full tests needed We need to run full set of tests for this PR to merge type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants