-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize OracleSchemaManager::listTables #2766
Conversation
That's a lot of additional code with no additional tests covering all the newly introduced decision paths: if this patch is not a functional change, I'm inclined to rejecting it, sorry. |
@Ocramius can you please leave it open for a while and see if I or anyone else can add appropriate tests. The underlying issue is relevant, you can't really use productively schema manager's |
That's fine by me: I realise that you put a lot of effort in it, but any
code path that isn't tested is a possible bug if not a security issue, so
it won't be merged until carefully tested.
Since this code runs oracledb specific queries, the tests should be
integration ones.
…On 3 Jul 2017 15:40, "mondrake" ***@***.***> wrote:
@Ocramius <https://github.com/ocramius> can you please leave it open for
a while and see if I or anyone else can add appropriate tests, The
underlying issue is relevant, you can't really use productively schema
manager's createSchema() over an Oracle database that has some tables in
it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2766 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAJakMJL65BfXWZkp7UQSOyZ2kAeKTbNks5sKO9ggaJpZM4OMSnL>
.
|
…ge number of tables.
Added a functional test to create 150 test tables, then run |
Looking at the code of the Oracle test file, it seems there are some tables created for other tests that are not deleted before your new method create it's 150 new tables. So the total number of tables in the database will be greater than 150. |
@mathieubouchard yes, I changed the test in #2767 to test for more than 150 tables, but also most importantly to measure the execution time. Any idea why we have this failure in your original code
which affects also other tests? |
The problem is with the use of array_key_exists($foreignKeyRow['table_name'], $tablesForeignKeys) in the method listTablesForeignKeys. the table_name column is not in foreignKeyRow the first time a column is parsed for a given table.
There should be an additional check for array_key_exists('table_name', $foreignKeyRow) before the assignment. Per the PHP documentation, this comportement is : Attempting to access an array key which has not been defined is the same as accessing any other undefined variable: an E_NOTICE-level error message will be issued, and the result will be NULL. I suppose that the PHPUnit test considers an E_NOTICE as an error and throw the PHPUnit_Framework_Error_Notice exception in that case. |
Thanks @mathieubouchard , is your comment already based on the latest commit 2ffcec0? |
Based on the error detailed in the following build: https://app.continuousphp.com/git-hub/doctrine/dbal/build/41d54cad-99e4-4e4c-be18-4c06fb7fa26c |
OK, so I think that this comment from #2676
is the way to go - that means 4 SQL queries regardless of the number of tables in the database, instead of [1 (get the tables) + #tables * 3 (to get for each table its columns, indexes and foreign keys separately)]. For 150 tables that means 4 queries instead of 451. However, the changes so far need further work. My suggestion for now is to focus on the test - make it more robust by adding indexes and foreign keys to the test case, and make it pass on currrent code, with the current performance. With that done, then turn to the approach above. |
…ys of test tables.
The test now fails only on the last assert, that measures the performance of |
Maybe we do not need to add separate methods to If we accept a convention that passing Doing that for |
…es' columns in a single query.
The first approach I had was to have only one method to support both use cases. This will reduce the number of lines of code changed and eliminate duplicate SQL code. But this is a breaking change on the AbstractPlatform API. That's why I created other public methods, specific to Oracle. |
Thanks @mathieubouchard
Why? |
The method signatures doesn't accept null for the table parameter, for example : |
Ah OK, but there's doesn't seem to be any code checking or preventing passing Maybe this is a question for @Ocramius . |
It would solve the issue I mentioned above but still would leave the possibility for other similar issues to exist. To me, it would make more sense to remove code duplication instead of adding more tests which cover the same cases. Why do we need two implementations per use case? |
@morozov this is now taking away the code duplication. ContinuousPHP that runs Oracle tests is green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure-wise looks good to me.
@mondrake please see some more comments.
Someone from the team, please take a look at the usage of quoted/unquoted table names here. I'm not really familiar with the APIs which produce the quoted ones.
} | ||
else { | ||
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the block from the beginning of the method until here moved into a method? Looks like it's the same everywhere.
$quotedDatabaseIdentifier = $this->quoteStringLiteral($databaseIdentifier->getName()); | ||
} | ||
else { | ||
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a quoted identifier, it's an expression. The variable name is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to execute this sub-query for every row in the query? I believe the current DB name should be obtained in one query and then used as a value in the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point here is that we do not have a connection to query against in the Platform object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should use ALL_*
objects for the non-current schema and USER_*
ones for the current one.
Because in the new concept we use the
ALL_*
views that provide information about Oracle DBMS objects, instead of the slowerUSER_*
views that do a lot of more checking about user being authorised to access an object (checked with EXPLAIN PLAN).
Is using USER_*
slower than using a sub-query on each row of ALL_*
?
* | ||
* @return string | ||
*/ | ||
public function getListIndexesSQL(?string $database, ?string $table): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be made private
since it implements two different behaviors (all indices and table indices). To retrieve all indices, the caller should use another method like getListAllIndexesSQL()
which would call this one with the $table = NULL
. Thus, we'll have a cleaner public API.
else { | ||
$quotedDatabaseIdentifier = "(SELECT SYS_CONTEXT('userenv', 'current_schema') FROM DUAL)"; | ||
} | ||
$tableCondition = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it's an empty string, not NULL
.
FROM all_ind_columns ind_col | ||
LEFT JOIN all_indexes ind ON ind.owner = ind_col.index_owner AND ind.index_name = ind_col.index_name | ||
LEFT JOIN all_constraints con ON con.owner = ind_col.index_owner AND con.index_name = ind_col.index_name | ||
WHERE ind_col.index_owner = $quotedDatabaseIdentifier $tableCondition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space between $quotedDatabaseIdentifier
and $tableCondition
should be part of the non-empty value of $tableCondition
. Otherwise, we'll have an unneeded trailing space in case of empty $tableCondition
.
|
||
$tables = []; | ||
foreach ($tableNames as $tableName) { | ||
$unquotedTableName = trim($tableName, '"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quote should be replaced with $this->_platform->getIdentifierQuoteCharacter()
. It's not an unquoted table name, it's just a table name. And the original $tableName
is the quoted table name.
con.constraint_type AS is_primary | ||
FROM all_ind_columns ind_col | ||
LEFT JOIN all_indexes ind ON ind.owner = ind_col.index_owner AND ind.index_name = ind_col.index_name | ||
LEFT JOIN all_constraints con ON con.owner = ind_col.index_owner AND con.index_name = ind_col.index_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace in ON con.owner
.
* Returns the SQL for a list of indexes in the database. | ||
* | ||
* @param string $database | ||
* @param string $table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some explanation of the meaning of the parameters given they are nullable.
|
||
$indexes = []; | ||
if (isset($indexesByTable[$unquotedTableName])) { | ||
$indexes = $this->_getPortableTableIndexesList($indexesByTable[$unquotedTableName], $tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does _getPortableTableIndexesList()
expect the quoted table name as the 2nd parameter? I cannot see it used anywhere in the implementation of _getPortableTableIndexesList()
besides dispatching some event. Could you please explain what it is for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just passing over to honour the arguments and the chain or parent calls. The code here does not do anything with that.
The type of |
@morozov made the changes you suggested. Some comments from my side added to yours that now show having been outdated. |
* | ||
* @return string | ||
*/ | ||
public function getListAllIndexesSQL(?string $database = null): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be either nullable or optional, not both. I think the former is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll strongly disagree, it should be either defaulting to null AND nullable or neither or just nullable. Optionality and nullablility are two things (and for null, the former implies the latter, but not the other way). Missing nullability sign for parameters defaulting to null is PHP's design flaw introduced only because 7.0 had no nullables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some CS nitpicking, mostly.
* | ||
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove useless (duplicated) @return
, it doesn't add any extra information.
$databaseIdentifier = $this->normalizeIdentifier($database); | ||
return $this->quoteStringLiteral($databaseIdentifier->getName()); | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for else, we have return above
$tableCondition = " AND ind_col.table_name = $quotedTableIdentifier"; | ||
} | ||
return | ||
<<<SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be on the same line as return, but I'm not sure ATM how this is handled by CS.
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
* | ||
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
* | ||
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
* | ||
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
* @param string $database The database schema. If NULL or '/', the currently active schema will be queried. | ||
* @param string $table The table. If left NULL, the SQL will return all the indexes in the database. | ||
* | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this @return
is also useless
{ | ||
$databaseCondition = $this->getDatabaseCondition($database); | ||
$tableCondition = ''; | ||
if ($table !== null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add newlines around if to improve readability
# Conflicts: # lib/Doctrine/DBAL/Platforms/OraclePlatform.php # tests/Doctrine/Tests/DBAL/Platforms/OraclePlatformTest.php
…ke/dbal into opt_oracle_schema_manager
The problem is still there, schema introspection is practically unusable on an Oracle database. If anyone interested, https://github.com/mondrake/dbal/tree/oracle-schema-list-tables has the fix for DBAL 3. |
Sorry, I didn't mean to close this, I just meant to rename |
@greg0ire because in the meantime I actually dropped and re-forked the repo, so the originally referenced fork branch is gone. Probably the best course is to keep this PR closed now, and open a new one instead. |
A PR replicating the patch included in #2676 by @mathieubouchard .
Tests executed on continuousphp show that running
createSchema
on an Oracle db with about 170 tables takes over a minute. See #2767.