Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation in MySQLSchemaManager #5638

Closed
m-vo opened this issue Aug 31, 2022 · 19 comments · Fixed by #5639
Closed

Performance degradation in MySQLSchemaManager #5638

m-vo opened this issue Aug 31, 2022 · 19 comments · Fixed by #5639

Comments

@m-vo
Copy link
Contributor

m-vo commented Aug 31, 2022

Bug Report

Q A
Version 3.4.1

Summary

The changes from #5580 unfortunately severely degrade the performance for schema lookups with MySQL/MariaDB. The culprit is the newly added JOIN to filter out views in selectTableColumns(). On my test setup the query took over 50x longer (~480ms compared to ~8ms) which results in quite a big hit in some applications.

This seems like an odd DB performance problem. Interestingly, the following query (for the foo table) still is fast:

SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA = 'foo'

So the workaround could be to replace the JOIN with a subquery:

[…]
WHERE TABLE_SCHEMA = 'foo' AND TABLE_NAME IN (
    SELECT TABLE_NAME FROM information_schema.TABLES
    WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA = 'foo'
) 

At least in my tests this solved the issue.

/cc @morozov

@morozov
Copy link
Member

morozov commented Aug 31, 2022

Thanks for reporting it as a new issue, @m-vo. The previous regression I referred to earlier is #5202. It was also caused by a schema introspection query using JOIN.

The next steps would be:

  1. Identify the root cause of the issue from the MySQL point of view (e.g. is it also related to bug #81347?). If that's the case, see if the issue is reproducible on MySQL 8. BTW, which version do you use which is affected by the issue?
  2. Create a pull request with your proposed fix.

Are you up making any of the above steps?

@morozov
Copy link
Member

morozov commented Aug 31, 2022

Looking closer at #5195, I recall that the problem was caused specifically by the fact that we used the TABLE_SCHEMA column of the main table in the JOIN condition (e.g. ON t.TABLE_SCHEMA = c.TABLE_SCHEMA). Prior to this bug being fixed, it resulted in MySQL prior to 8 and the corresponding versions of MariaDB scanning all databases on disk using filesystem operations instead of using a more optimized internal schema.

Could you check if replacing the right part in the ON clause of the JOIN condition with a literal solves the problem? Additionally, you can try running EXPLAIN on different versions of the query and see if there is a difference.

@m-vo
Copy link
Contributor Author

m-vo commented Aug 31, 2022

Could you check if replacing the right part in the ON clause of the JOIN condition with a literal solves the problem?

Ah yes, that indeed does work as well. Without a fixed literal all tables get scanned (Using where; Open_frm_only; Scanned all databases; Using join buffer (flat, BNL join)).

I've had the problem on MariaDB 10.6 and also got it reported for MySQL 5.7. Judging by the last comment of your referenced MySQL bug, this should probably be fixed in MySQL 8. But it would be interesting if this is indeed the case.

@m-vo
Copy link
Contributor Author

m-vo commented Aug 31, 2022

See #5639 for a first go.

@pxpm
Copy link

pxpm commented Sep 6, 2022

Please also see if this relates or is fixed by this issue: #5658

Cheers

@brainformatik
Copy link

Seems like we are having the same issue with the method selectForeignKeyColumns.
If the query is executed on our DB (~800 tables), it takes 0.5 seconds.
If we remove the inline comments of the query, it takes 0.0005 seconds.

@pxpm
Copy link

pxpm commented Sep 12, 2022

@brainformatik @morozov replied to me the other day that they are aware of that issue, not sure if there is already something open to fix it, or even if it is going to be fixed, check the reply here: #5658 (comment)

@morozov
Copy link
Member

morozov commented Sep 12, 2022

@brainformatik thanks for confirming the issue. There's no dedicated issue open for it and it's not being actively worked. A pull request implementing the approach from #5639 is welcome.

@brainformatik
Copy link

brainformatik commented Sep 13, 2022

The same solution as for selectTableColumns should be applicable here, we removed the CONSTRAINT_SCHEMA from the ON clause and put it as an additional condition, which seems to work fine and reduces the execution time to 0.005 seconds.

$sql .= <<<'SQL'
            k.CONSTRAINT_NAME,
            k.COLUMN_NAME,
            k.REFERENCED_TABLE_NAME,
            k.REFERENCED_COLUMN_NAME,
            k.ORDINAL_POSITION /*!50116,
            c.UPDATE_RULE,
            c.DELETE_RULE */
FROM information_schema.key_column_usage k /*!50116
INNER JOIN information_schema.referential_constraints c
ON c.CONSTRAINT_NAME = k.CONSTRAINT_NAME
AND c.TABLE_NAME = k.TABLE_NAME */
SQL;

        $conditions = ['k.TABLE_SCHEMA = ?'];
        $params     = [$databaseName];

        if ($tableName !== null) {
            $conditions[] = 'k.TABLE_NAME = ?';
            $params[]     = $tableName;
        }

        $conditions[] = 'k.REFERENCED_COLUMN_NAME IS NOT NULL';

        $conditions[] = 'c.CONSTRAINT_SCHEMA = ?';
        $params[]     = $databaseName;

        $sql .= ' WHERE ' . implode(' AND ', $conditions) . ' ORDER BY k.ORDINAL_POSITION';

The only problem with this would be, that if the executable comment is not applied so basically MySQL prior to 5.1.16, this will fail.

@brainformatik
Copy link

I think this rework should work in any case, do you have any concerns by putting the database name directly in the query without using a ? placeholder?

$sql .= <<<'SQL'
            k.CONSTRAINT_NAME,
            k.COLUMN_NAME,
            k.REFERENCED_TABLE_NAME,
            k.REFERENCED_COLUMN_NAME,
            k.ORDINAL_POSITION /*!50116,
            c.UPDATE_RULE,
            c.DELETE_RULE */
FROM information_schema.key_column_usage k /*!50116
INNER JOIN information_schema.referential_constraints c
ON c.CONSTRAINT_NAME = k.CONSTRAINT_NAME
AND c.TABLE_NAME = k.TABLE_NAME */
SQL;

        $conditions = ['k.TABLE_SCHEMA = ?'];
        $params     = [$databaseName];

        if ($tableName !== null) {
            $conditions[] = 'k.TABLE_NAME = ?';
            $params[]     = $tableName;
        }

        $conditions[] = 'k.REFERENCED_COLUMN_NAME IS NOT NULL';

        $constraintCondition = '/*!50116 AND c.CONSTRAINT_SCHEMA = "' . $databaseName . '" */';

        $sql .= ' WHERE ' . implode(' AND ', $conditions) . $constraintCondition . ' ORDER BY k.ORDINAL_POSITION';

@morozov
Copy link
Member

morozov commented Sep 13, 2022

do you have any concerns by putting the database name directly in the query without using a ? placeholder?

Please try using ? and adding the database to the list of parameters ($params[] = $databaseName).

@brainformatik
Copy link

Tried it but it doesn't work. Seems like the parser ignores the ? inside the executable comment for the condition and then complains about not matching number of tokens and bound variables.

@morozov
Copy link
Member

morozov commented Sep 14, 2022

That makes sense, the parser isn't aware of conditional comments. Please do it as originally proposed then, but make sure the database name is quoted before concatenating it with the query.

@brainformatik
Copy link

Will do, but because I never contributed before, I'm not sure about best practice for quoting in this situation. I found the following methods that I can use:

  • quote or quoteIdentifier on the connection
  • quoteStringLiteral on the platform

Which one would you recommend to use in this case? Searched the repository and quoteStringLiteral is used a lot so this would be my guess.

@morozov
Copy link
Member

morozov commented Sep 14, 2022

In this case, it's best to use $this->connection->quote().

@morozov
Copy link
Member

morozov commented Sep 14, 2022

the parser isn't aware of conditional comments.

FWIW, I was referring to the DBAL's SQL parser which is used only for named or list parameters, which isn't the case here. MySQL itself seems to work fine in this case:

mysql> PREPARE t FROM 'SELECT /*! ? */';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

mysql> SET @v = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> EXECUTE t USING @v;
+------+
| ?    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

Which parser were you referring to?

@morozov
Copy link
Member

morozov commented Sep 14, 2022

It should be the PDO SQL parser that is causing trouble. If we use a placeholder, the tests pass on mysqli but fail on pdo_mysql.

@brainformatik please see if #5667 works for you.

@brainformatik
Copy link

It should be the PDO SQL parser that is causing trouble. If we use a placeholder, the tests pass on mysqli but fail on pdo_mysql.

@brainformatik please see if #5667 works for you.

Good to know.

Yes it is working for us. Didn't have the time yesterday to prepare the PR properly.

Thanks for quick solution!

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants