Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize OracleSchemaManager::listTables #2766

Closed
wants to merge 54 commits into from
Closed

Optimize OracleSchemaManager::listTables #2766

wants to merge 54 commits into from

Conversation

mondrake
Copy link
Contributor

@mondrake mondrake commented Jul 3, 2017

A PR replicating the patch included in #2676 by @mathieubouchard .

Tests executed on continuousphp show that running createSchema on an Oracle db with about 170 tables takes over a minute. See #2767.

Time: 4.18 minutes, Memory: 80.00MB 
 
There was 1 failure: 
 
1) Doctrine\Tests\DBAL\Functional\Schema\OracleSchemaManagerTest::testCreateSchemaOnLargeNumberOfTables 
createSchema() executed in less than 15 sec. 
Failed asserting that 64.17334699630737 is less than 15. 
 
tests/Doctrine/Tests/DBAL/Functional/Schema/OracleSchemaManagerTest.php:292 
 
FAILURES! 
Tests: 4613, Assertions: 8763, Failures: 1, Skipped: 519, Incomplete: 11. 

@Ocramius
Copy link
Member

Ocramius commented Jul 3, 2017

That's a lot of additional code with no additional tests covering all the newly introduced decision paths: if this patch is not a functional change, I'm inclined to rejecting it, sorry.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 3, 2017

@Ocramius can you please leave it open for a while and see if I or anyone else can add appropriate tests. The underlying issue is relevant, you can't really use productively schema manager's createSchema() over an Oracle database that has some tables in it.

@Ocramius
Copy link
Member

Ocramius commented Jul 3, 2017 via email

@mondrake
Copy link
Contributor Author

mondrake commented Jul 4, 2017

Added a functional test to create 150 test tables, then run createSchema on the database. The test fails, apparently there are some issues with the original patch. I will create a separate PR just with the test, which is relevant anyway in order to measure the time createSchema needs to run.

@mathieubouchard
Copy link
Contributor

Looking at the code of the Oracle test file, it seems there are some tables created for other tests that are not deleted before your new method create it's 150 new tables. So the total number of tables in the database will be greater than 150.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 4, 2017

@mathieubouchard yes, I changed the test in #2767 to test for more than 150 tables, but also most importantly to measure the execution time.

Any idea why we have this failure in your original code

Caused by 
PHPUnit_Framework_Error_Notice: Undefined index: table_name 
 
lib/Doctrine/DBAL/Schema/OracleSchemaManager.php:438 
lib/Doctrine/DBAL/Schema/OracleSchemaManager.php:379 
tests/Doctrine/Tests/DBAL/Functional/Schema/OracleSchemaManagerTest.php:36 

which affects also other tests?

@mathieubouchard
Copy link
Contributor

The problem is with the use of array_key_exists($foreignKeyRow['table_name'], $tablesForeignKeys) in the method listTablesForeignKeys. the table_name column is not in foreignKeyRow the first time a column is parsed for a given table.

    private function listTablesForeignKeys($database = null)
    {
        $sql = $this->_platform->getListTablesForeignKeysSQL($database);
        $tablesForeignKeysRows = $this->_conn->fetchAll($sql);

        $tablesForeignKeys = array();
        foreach ($tablesForeignKeysRows as $foreignKeyRow) {
            $foreignKeyRow = \array_change_key_case($foreignKeyRow, CASE_LOWER);
            
            if (!array_key_exists($foreignKeyRow['table_name'], $tablesForeignKeys)) {
                $tablesForeignKeys[$foreignKeyRow['table_name']] = array($foreignKeyRow);
            } else {
                $tablesForeignKeys[$foreignKeyRow['table_name']][] = $foreignKeyRow;
            }
        }

There should be an additional check for array_key_exists('table_name', $foreignKeyRow) before the assignment.

Per the PHP documentation, this comportement is :

Attempting to access an array key which has not been defined is the same as accessing any other undefined variable: an E_NOTICE-level error message will be issued, and the result will be NULL.

I suppose that the PHPUnit test considers an E_NOTICE as an error and throw the PHPUnit_Framework_Error_Notice exception in that case.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 4, 2017

Thanks @mathieubouchard , is your comment already based on the latest commit 2ffcec0?

@mathieubouchard
Copy link
Contributor

Based on the error detailed in the following build: https://app.continuousphp.com/git-hub/doctrine/dbal/build/41d54cad-99e4-4e4c-be18-4c06fb7fa26c

@mondrake
Copy link
Contributor Author

mondrake commented Jul 5, 2017

OK, so I think that this comment from #2676

... optimize the creation of a Schema from an Oracle database by overriding the OracleSchemaManager->listTables method to get the list of columns, foreign keys and indexes one time for all the tables in the current database. That information is then used to create the table schemas.

is the way to go - that means 4 SQL queries regardless of the number of tables in the database, instead of [1 (get the tables) + #tables * 3 (to get for each table its columns, indexes and foreign keys separately)]. For 150 tables that means 4 queries instead of 451.

However, the changes so far need further work. My suggestion for now is to focus on the test - make it more robust by adding indexes and foreign keys to the test case, and make it pass on currrent code, with the current performance. With that done, then turn to the approach above.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 5, 2017

The test now fails only on the last assert, that measures the performance of createSchema. We have tables, indexes, foreign keys created in a large number (150), including a table with quoted identifiers. The test checks the consistency of the objects returned by createSchema, so that now we can go back adding the code of the original patch and override the ::listTables method with one specific for Oracle, and see that it keeps returning same results.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 5, 2017

Maybe we do not need to add separate methods to OraclePlatform to return SQL for all tables' columns / indexes / foreign keys.

If we accept a convention that passing null as $table parameter to getListTableColumnsSQL, getListTableIndexesSQL, getListTableForeignKeysSQL, we return the SQL for enquiring the entire database, we can fit in the current methods.

Doing that for getListTableColumnsSQL in next commit. Just checking that we do not break anything, no implementation yet.

@mathieubouchard
Copy link
Contributor

The first approach I had was to have only one method to support both use cases. This will reduce the number of lines of code changed and eliminate duplicate SQL code.

But this is a breaking change on the AbstractPlatform API. That's why I created other public methods, specific to Oracle.

@mondrake
Copy link
Contributor Author

mondrake commented Jul 5, 2017

Thanks @mathieubouchard

this is a breaking change on the AbstractPlatform API

Why?

@mathieubouchard
Copy link
Contributor

The method signatures doesn't accept null for the table parameter, for example :
http://www.doctrine-project.org/api/dbal/2.0/class-Doctrine.DBAL.Platforms.AbstractPlatform.html#_getListTableColumnsSQL

@mondrake
Copy link
Contributor Author

mondrake commented Jul 5, 2017

Ah OK, but there's doesn't seem to be any code checking or preventing passing null there, neither in the abstract class nor in its implementations. Would it be a major problem to allow that?

Maybe this is a question for @Ocramius .

@morozov
Copy link
Member

morozov commented Oct 9, 2017

It would solve the issue I mentioned above but still would leave the possibility for other similar issues to exist. To me, it would make more sense to remove code duplication instead of adding more tests which cover the same cases.

Why do we need two implementations per use case?

@mondrake
Copy link
Contributor Author

mondrake commented Oct 31, 2017

@morozov this is now taking away the code duplication. getListTableColumnsSQL, getListTableForeignKeysSQL, and getListTableIndexesSQL remain as one-liners calling the new methods that can all manage to fetch info for either the full db or a single table.

ContinuousPHP that runs Oracle tests is green.

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure-wise looks good to me.

@mondrake please see some more comments.

Someone from the team, please take a look at the usage of quoted/unquoted table names here. I'm not really familiar with the APIs which produce the quoted ones.

}
else {
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the block from the beginning of the method until here moved into a method? Looks like it's the same everywhere.

$quotedDatabaseIdentifier = $this->quoteStringLiteral($databaseIdentifier->getName());
}
else {
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)";
Copy link
Member

@morozov morozov Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a quoted identifier, it's an expression. The variable name is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to execute this sub-query for every row in the query? I believe the current DB name should be obtained in one query and then used as a value in the second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that we do not have a connection to query against in the Platform object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should use ALL_* objects for the non-current schema and USER_* ones for the current one.

Because in the new concept we use the ALL_* views that provide information about Oracle DBMS objects, instead of the slower USER_* views that do a lot of more checking about user being authorised to access an object (checked with EXPLAIN PLAN).

Is using USER_* slower than using a sub-query on each row of ALL_*?

*
* @return string
*/
public function getListIndexesSQL(?string $database, ?string $table): string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be made private since it implements two different behaviors (all indices and table indices). To retrieve all indices, the caller should use another method like getListAllIndexesSQL() which would call this one with the $table = NULL. Thus, we'll have a cleaner public API.

else {
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)";
}
$tableCondition = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's an empty string, not NULL.

FROM all_ind_columns ind_col
LEFT JOIN all_indexes ind ON ind.owner = ind_col.index_owner AND ind.index_name = ind_col.index_name
LEFT JOIN all_constraints con ON con.owner = ind_col.index_owner AND con.index_name = ind_col.index_name
WHERE ind_col.index_owner = $quotedDatabaseIdentifier $tableCondition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space between $quotedDatabaseIdentifier and $tableCondition should be part of the non-empty value of $tableCondition. Otherwise, we'll have an unneeded trailing space in case of empty $tableCondition.


$tables = [];
foreach ($tableNames as $tableName) {
$unquotedTableName = trim($tableName, '"');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quote should be replaced with $this->_platform->getIdentifierQuoteCharacter(). It's not an unquoted table name, it's just a table name. And the original $tableName is the quoted table name.

con.constraint_type AS is_primary
FROM all_ind_columns ind_col
LEFT JOIN all_indexes ind ON ind.owner = ind_col.index_owner AND ind.index_name = ind_col.index_name
LEFT JOIN all_constraints con ON con.owner = ind_col.index_owner AND con.index_name = ind_col.index_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace in ON con.owner.

* Returns the SQL for a list of indexes in the database.
*
* @param string $database
* @param string $table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some explanation of the meaning of the parameters given they are nullable.


$indexes = [];
if (isset($indexesByTable[$unquotedTableName])) {
$indexes = $this->_getPortableTableIndexesList($indexesByTable[$unquotedTableName], $tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does _getPortableTableIndexesList() expect the quoted table name as the 2nd parameter? I cannot see it used anywhere in the implementation of _getPortableTableIndexesList() besides dispatching some event. Could you please explain what it is for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just passing over to honour the arguments and the chain or parent calls. The code here does not do anything with that.

@morozov
Copy link
Member

morozov commented Nov 3, 2017

The type of OracleSchemaManager#$_platform should be changed to OraclePlatform since it uses the newly introduced Oracle-specific methods.

@mondrake
Copy link
Contributor Author

@morozov made the changes you suggested. Some comments from my side added to yours that now show having been outdated.

*
* @return string
*/
public function getListAllIndexesSQL(?string $database = null): string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be either nullable or optional, not both. I think the former is preferable.

Copy link
Contributor

@Majkl578 Majkl578 Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll strongly disagree, it should be either defaulting to null AND nullable or neither or just nullable. Optionality and nullablility are two things (and for null, the former implies the latter, but not the other way). Missing nullability sign for parameters defaulting to null is PHP's design flaw introduced only because 7.0 had no nullables.

Copy link
Contributor

@Majkl578 Majkl578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some CS nitpicking, mostly.

*
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove useless (duplicated) @return, it doesn't add any extra information.

$databaseIdentifier = $this->normalizeIdentifier($database);
return $this->quoteStringLiteral($databaseIdentifier->getName());
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for else, we have return above

$tableCondition = " AND ind_col.table_name = $quotedTableIdentifier";
}
return
<<<SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be on the same line as return, but I'm not sure ATM how this is handled by CS.

* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

*
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

*
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

*
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

* @param string $database The database schema. If NULL or '/', the currently active schema will be queried.
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database.
*
* @return string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this @return is also useless

{
$databaseCondition = $this->getDatabaseCondition($database);
$tableCondition = '';
if ($table !== null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add newlines around if to improve readability

@mondrake
Copy link
Contributor Author

@Ocramius can we remove the Missing Tests tag? That's been addressed based on @morozov directions.

@mondrake mondrake removed their assignment Oct 25, 2020
@greg0ire greg0ire closed this Jan 22, 2021
@greg0ire greg0ire deleted the branch doctrine:master January 22, 2021 07:44
@mondrake
Copy link
Contributor Author

The problem is still there, schema introspection is practically unusable on an Oracle database. If anyone interested, https://github.com/mondrake/dbal/tree/oracle-schema-list-tables has the fix for DBAL 3.

@greg0ire
Copy link
Member

Sorry, I didn't mean to close this, I just meant to rename master to 4.0.x. Why does it say unknown repository as source repository?

@mondrake
Copy link
Contributor Author

@greg0ire because in the meantime I actually dropped and re-forked the repo, so the originally referenced fork branch is gone. Probably the best course is to keep this PR closed now, and open a new one instead.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants