-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CTE support to select in QueryBuilder #6621
Conversation
@nio-dtp, thanks for the PR. As this is a new feature, please retarget against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a very basic unit test that would demonstrate the expected SQL and an integration test that would run the resulting query on the platforms that support CTE.
I'm curious to see how this will work with query parameters (both positional and named).
2936e39
to
686bf02
Compare
79cf2fb
to
1e545fc
Compare
@morozov Thanks for the review. I've added some tests and a I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7 |
We try to avoid those If we did that, what would be the worst this that could happen? MySQL 5.7 users will get a syntax error if and only if they try to run a query with a CTE. I could live with that. If we really want a nicer error message for MySQL 5.7 users, we could move the Either way: Please remove the |
Thanks @nio-dtp for providing this PR. ❤️ Love to see that other has the same needs and staying on the DBAL way to implement this. Just let me add my five cents to this. First, I would remove the platform support cte methods as @derrabus already mentioned and asked for. In the end, we would end up with multiple support methods if we want to take care beforehand of support questions, because even a lot of vendors supports CTE in general, they have dirty little differences. I'm here more for let the database report if it does not understand something and that it needs to be done from the application then. DBAL should here only provide a generic way to let WITH (CTE) syntax in general created and used. The second point I want to put a light on is the current implementation of the Regarding the WITH
baseCTE AS (SELECT id, title FROM tableName),
secondCTE AS (SELECT id, title FROM baseCTE where id < 1000)
SELECT * from secondCTE If public function with(string $name, string|\Stringable|self $part): self
{
$this->with = [];
$this->with[] = [ /* adding first part */ ];
}
public function addWith(string $name, string|\Stringable|self $part, string ...$dependsOn): self
{
$this->with[] = [
/* adding additional part */
];
} The current implemetation would already allow to add a WITH RECURSIVE
recursiveCTE AS (
-- initial query
SELECT id, parentId, title FROM table_a WHERE parentId = 0
UNION
-- recursive query
SELECT b.id, b.parentId, b.title
FROM table_b AS b
WHERE b.parentId > 0 AND b.parentId = recursiveCTE.id
)
SELECT * FROM recursiveCTE With recursive CTE the whole thing gets a little more nifty, as different vendors requires or allow lazyness on different levels. For example:
WITH RECURSIVE
recursiveCTE(virtual_id, virtual_title, virtual_parentid) AS (
-- initial query
SELECT
id AS virtual_id,
title AS virtual_title,
parentId AS virtual_parent_id
FROM table_a WHERE parentId = 0
UNION
-- recursive query
SELECT
b.id AS virtual_id,
b.title AS virtual_title,
b.parentId AS virtual_parentid
FROM table_b AS b
WHERE b.parentId > 0 AND b.parentId = recursiveCTE.virtual_id
)
SELECT * FROM recursiveCTE That is not fully possible with the current implementation. So we could consider
In any case, the From a internal handling perspective, I suggest to introduce a internal class for cte parts similar to the internal Join part class along with the options required (name, field, depends, isRecursive etc) which would allow to omit a dedicated flag and later iterate the array and add the RECURSIVE keywoard as soon as at least one part has it set. Or a additional container around the parts could be implemented, which internal tracks that when a recursive part gets added. I would not make that class public API. With such a To be honest, I did not investigated yet which DBAL supported platforms allows RECURSIVE cte's on top of normal CTE's and which not and this was still outstanding. I started a custom implementation a couple of months ago [1], but delayed it due to time constraints and implemented it as a custom solution within the TYPO3 decorated QueryBuilder (prefixed) as a test-baloon [2] as we needed it for the release after implementing a first (quite advanced) usage [3] To summerize this, this PR is a good start but in my eyes not finished yet (without wanting to blame it), because in general it is the same I started with before adding the additional stuff. To be honest, in my working state I had also the support methods but already considered to drop them again before making a pull-request out of it. I propose that we clarify first what Doctrine DBAL wants to support and what not out of all these things and than where to continue. Either (if @nio-dtp) is open to adopt it and continue or if I should polish and finish my work (which is alrady some steps further) and adopt to the decisions made (dependency sorting or not, internal class usage or not, ...) Suggested method(s): /**
* @param non-empty-string[] $fields
*/
public function with(
string $identifier,
string|\Stringable|QueryBuilder $part,
array $fields = [],
WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}
/**
* @param non-empty-string[] $dependsOn
* @param non-empty-string[] $fields
*/
public function addWith(
string $identifier,
string|\Stringable|QueryBuilder $part,
array $dependsOn = [],
array $fields = [],
WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {} Developers implementing a recursive CTE needs to use a dedicated QueryBuilder instance to build a union query using the union support to define it, if not done the database should report this as an issue and not tried to scan or throw a custom exception from doctrine. Recursive CTE's have been the reason why I started and contributed the union support for the QueryBuilder as a preparation for providing a CTE implementation. In my POC/WIP i extendted the I just would not merge this one here to quickly before considering the aforementioned points, at least for a overall strategy and either making it directly or with followups.
|
Thanks you to all the reviewers and their detailed feedback, I'll rework my PR and make a more complete proposal.
|
I'd improve the following aspects:
As for the "reset" method – what is the use case for it? It would basically allow to disregard the logic of a pre-built query. If it wasn't necessary, why declare it in the first place? |
I talked with @derrabus on sunday about this and came up with following using 4 methods: /**
* @param string[] $fields
*/
public function with(
string $name,
string|\Stringable|self $part,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function addWith(
string $name,
string|\Stringable|self $part,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function withRecursive(
string $name,
string|\Stringable|self $initialOrUnionPart,
string|\Stringable|self|null $recursivePart,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function addRecursive(
string $name,
string|\Stringable|self $initialOrUnionpart,
string|\Stringable|self|null $recursivePart,
array $fields = [],
): self {} If a UNION query is passed as the first requried part for the recursive vairants, it is simply used. If both parts are passed, internally a union query should be created (using the union querybuilder api without allowing duplicates). Should be explained within the method phpdocblocks and by providing a concrete single part the developer has full power about the recursive union block. to follow semantic and logic similar to with or withRecursive will reset the internal array and create a first element, whereas the addWith* methods adds an additionall entry to the internal array (DTO object). The We think that a dedicated It should be possible to define the fields for the CTE part, but having that optional: WITH
customCte(virtual_id, virtual_field)
AS (SELECT id AS virtual_id, somefield AS virtual_field FROM sometable)
SELECT * from customCte Regarding my point for the `depends, in special for the recursive CTE's we discussed and decided to not provide an API in this low-level implementation. Developers or frameworks (for example ORM or similar) should keep track on there own and add the parts in the required and correect order. No custom sorting or cylcing detection, will be reported by the database when executed. We had no hard meaning for the internal implementation and building the query, so the current switch form may be okay. Personally, I think it would make more sense to move that into the DefaultSelectQueryBuilder and pass the with array (of DTO's) within the with/addWith and withRecursive/addWithRecursive naming would follow the semantic Doctrine already has for the other parts, exceopt that the *Recursive variant makes it more clear. Sure, PHPDoc block needs to make that clear. In the end, it kind of follows the design approach of the QueryBuilder. And it is noticable that the top (most outer) querybuilder instance (in most cases that instance where the with/addWith/withRecursive/addWithRecursive methods are used) needs to be used for creating named placeholders (parameters). That matches the same requriement and flow as it is needed for using QueryBuilder to create sub queries or for the union support already and can therefore be taken as expectable. (Remark to #6621 (comment)) Taking this, we could make this one a first implementation for the My question here is, if @nio-dtp wants to update and work on this and has the time for it the next time. Otherwise I would rebase and finish my work in my fork and provide an additional PR in the next two weeks (which was a start after an original pitch and discussion with @derrabus but not added as PR(draft pr) yet due to time constraints). I'm totally fine not to do anything and test/verify/review this later on but also being fine finishing mine (because of access) and mention @nio-dtp as co-author then. |
If it is ok for all, I'll make a proposal by the end of the week without the recursive part. |
A small remark on this: I would remove the |
Thanks @nio-dtp, and that is pretty fine. I will add the recursive part afterwards in a follow up PR. |
Going to review/test this after day job today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR still needs some work.
Regarding the let's start with merging parameters from sub-query-builder-instances
:
I get the point, basically, but in my eye's this "first" incomplete implementation for the new CTE
leads to more confusion than the current state.
- Until know the
top/most-outer
QueryBuilder instance needs to be used when a SQL query is composed using nested (multiple) QueryBuilder instances, for example as calculatedselect()
field or forwhere/having/ordering
orsub-select tables to join/from
union()
don't merge the placeholders also for QueryBuilder instances.- Using plain SQL parts needs to set it in the
top/most outer
instance again. - Wrapping within expressions (ExpressionBuilder) cannot hold a dedicated sub instance and needs to be converted to SQL already, which means that it needs to set it on the outer instance (top/most outer instance) already.
I think, until the "merging" feature can be ensured throughout the whole API with nesting parts, this leads to more issues and confusion than the current "use the top/most-outer" part.
Specially the "merging" and thus overriding the values for auto-named placeholders is hard to debug for people not that deep into the Doctrine DBAL implementation (casual user / developers).
Personally, I would prefere to remove that again for now and make it in a dedicated overall-concept and API(signature) changes instead of a "partly" working solution. As I'm not a maintainer I cannot block it anyway, but at least the "support" in the current incomplete form should at least get additional tests to either show-case the short-commings and merging the same named placeholder from different instances (with differnt values?) should get some thoughts how to handle these - as there will be no error from the database, just simply weired results which are not that obvious in most cases.
If the merging is kept and okay for the Doctrine DBAL maintainers, we should really quickly add some kind of merging logic at least to the union()/addUnion()
API to increase that support (and prepare towards the recursive CTE parts requireing UNION and needs to merge that along with the with parts - otherwise it will be broken in between.
|
||
private function platformSupportsCTEColumnDefinition(): bool | ||
{ | ||
return $this->connection->getDatabasePlatform() instanceof SQLitePlatform; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLite is not the only platform allowing CTE column definition - why is this limited to SQLite only ?
WITH cte_a(col_a, col_b) AS(....) SELECT * FROM cte_a
I know for sure that this works for:
- MySQL8.0+
- MariaDB 10.4.3+
- PostgresSQL 10+
- SQLite
as we are already testing this within TYPO3 with the "custom prefixed" solution in our decoracted QueryBuilder (with some auto quoting stuff).
If tests are failing because of "colums" there must be another reason for it, and not that it generally does not work:
#[Test]
public function twoSimpleValueCommonTableExpressionsJoinedReturnsExpectedResult(): void
{
// # --
// # Builds following SQL (for MySQL - other databases varies)
// # --
//
// WITH
//
// cte1 (a, b) AS (
// SELECT
// 1 AS `a`,
// 'value-a' AS `b`
// ),
//
// cte2 (c, d) AS (
// SELECT
// 1 AS `c`,
// 'value-c' AS `d`
// )
//
// SELECT
// `a` AS `id`,
// `b` AS `value1`,
// `d` AS `value2`
// FROM `cte1`
// INNER JOIN `cte2` `cte2` ON (cte1.a = cte2.c)
$expectedRows = [
['id' => 1, 'value1' => 'value-a', 'value2' => 'value-c'],
];
$connection = $this->getConnectionPool()->getConnectionByName('Default');
$selectQueryBuilder = $connection->createQueryBuilder();
$expr = $selectQueryBuilder->expr();
$simpleValueListQueryBuilder1 = $connection->createQueryBuilder();
$simpleValueListQueryBuilder1
->selectLiteral(
$expr->as('1', $selectQueryBuilder->quoteIdentifier('a')),
$expr->as($selectQueryBuilder->quote('value-a'), 'b'),
);
$simpleValueListQueryBuilder2 = $connection->createQueryBuilder();
$simpleValueListQueryBuilder2
->selectLiteral(
$expr->as('1', 'c'),
$expr->as($selectQueryBuilder->quote('value-c'), 'd'),
);
$selectQueryBuilder
->typo3_with('cte1', $simpleValueListQueryBuilder1, ['a', 'b'])
->typo3_addWith('cte2', $simpleValueListQueryBuilder2, ['c', 'd'])
->selectLiteral(
$expr->as($selectQueryBuilder->quoteIdentifier('a'), 'id'),
$expr->as($selectQueryBuilder->quoteIdentifier('b'), 'value1'),
$expr->as($selectQueryBuilder->quoteIdentifier('d'), 'value2'),
)
->from('cte1')
->join('cte1', 'cte2', 'cte2', '(cte1.a = cte2.c)');
//$sql = $selectQueryBuilder->getSQL();
self::assertSame($expectedRows, $selectQueryBuilder->executeQuery()->fetchAllAssociative());
}
which is tested against MySQL 8.0.17+, MariaDB 10.4.3+, PostgresSQL 10+ and SQLite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had a wrong syntax that was resulting in error for some platforms
WITH cte_a(col_a, col_b) AS(....) SELECT * FROM cte_a
Space is necessary between cte name and columns list
WITH cte_a (col_a, col_b) AS(....) SELECT * FROM cte_a
Tests should be ok now for MySQL 8.0.17+, MariaDB 10.4.3+, PostgresSQL 10+ and SQLite
{ | ||
$platform = $this->connection->getDatabasePlatform(); | ||
|
||
return ! $platform instanceof MySQLPlatform || $platform instanceof MySQL80Platform; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to read and easy to get wrong. Can we use parenthesis here to make it clear what the check is ?
return ! $platform instanceof MySQLPlatform || $platform instanceof MySQL80Platform; | |
return (! $platform instanceof MySQLPlatform) || $platform instanceof MySQL80Platform; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these parenthesis.
src/Query/QueryBuilder.php
Outdated
continue; | ||
} | ||
|
||
$cteParams = array_merge($cteParams, $cte->query->params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overrides named placehoders, if two part QueryBuilder instances auto creates named placeholders, specially when QueryBuilder->createNamedParameter()
are used with automatic name generation based on a counter:
$cteQueryBuilder1 = $this->connection->createQueryBuilder();
$cteQueryBuilder1->select('id AS virtual_id')
->from('for_update')
->where('virtual_id = ' . $cteQueryBuilder1->createNamedParameter(123, ParameterType::INTEGER));
$cteQueryBuilder2 = $this->connection->createQueryBuilder();
$cteQueryBuilder2->select('id AS virtual_id')
->from('cte_a')
->where('virtual_id = ' . $cteQueryBuilder2->createNamedParameter(987, ParameterType::INTEGER));
would merge them in a weired way. Additionally, the tests related to named placeholder only covers manual defining them with setting parameter, not using the create methods along with autogeneration and using in two sub-querybuilder instances which gets merged.
dbal/src/Query/QueryBuilder.php
Lines 1403 to 1440 in c91166a
/** | |
* Creates a new named parameter and bind the value $value to it. | |
* | |
* This method provides a shortcut for {@see Statement::bindValue()} | |
* when using prepared statements. | |
* | |
* The parameter $value specifies the value that you want to bind. If | |
* $placeholder is not provided createNamedParameter() will automatically | |
* create a placeholder for you. An automatic placeholder will be of the | |
* name ':dcValue1', ':dcValue2' etc. | |
* | |
* Example: | |
* <code> | |
* $value = 2; | |
* $q->eq( 'id', $q->createNamedParameter( $value ) ); | |
* $stmt = $q->executeQuery(); // executed with 'id = 2' | |
* </code> | |
* | |
* @link http://www.zetacomponents.org | |
* | |
* @param string|null $placeHolder The name to bind with. The string must start with a colon ':'. | |
* | |
* @return string the placeholder name used. | |
*/ | |
public function createNamedParameter( | |
mixed $value, | |
string|ParameterType|Type|ArrayParameterType $type = ParameterType::STRING, | |
?string $placeHolder = null, | |
): string { | |
if ($placeHolder === null) { | |
$this->boundCounter++; | |
$placeHolder = ':dcValue' . $this->boundCounter; | |
} | |
$this->setParameter(substr($placeHolder, 1), $value, $type); | |
return $placeHolder; | |
} |
Tests related to the placeholder
merging and usage is not sufficient enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not work with default (auto) placeholder because of the boundCounter
value which start to 1 for each CTE.
But it is ok when setting placeholder
->where($cteQueryBuilder1->expr()->eq(
'id',
$cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id1'),
));
->where($cteQueryBuilder2->expr()->in(
'id',
$cteQueryBuilder2->createNamedParameter([1, 2], ArrayParameterType::INTEGER, ':id2'),
));
I've added some tests for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My whole point is, that starting this leads to more things some developers needs to know beside a simple "use the top QueryBuilder" instance. Not a fan of starting this and leave it half way broken and unfinished. But that is more a personal taste:
->where($cteQueryBuilder1->expr()->eq(
'id',
$cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id'),
));
->where($cteQueryBuilder2->expr()->in(
'id',
$cteQueryBuilder2->createNamedParameter([1, 2], ArrayParameterType::INTEGER, ':id'),
));
In that case database error should occor and be a little bit of help that something is off when merging the CTE parts on query execution.
->where($cteQueryBuilder1->expr()->eq(
'id',
$cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id'),
));
->where($cteQueryBuilder2->expr()->in(
'id',
$cteQueryBuilder2->createNamedParameter(2, ParameterType::INTEGER, ':id'),
));
would merge that and use 2
for both places for the same :id
without any notices - and will give a hard time to debug and find this.
Askin me the whole merging should vanish again until it could be implemented completly in all parts and with a concept behind it. But that is not my decision.
Let's see what @derrabus would say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it could give a hard time to debug this. We could prevent this by adding a check during the merge process and reject a parameter's name that is already used.
src/Query/QueryBuilder.php
Outdated
|
||
foreach ($this->commonTableExpressions as $cte) { | ||
if (! $cte->query instanceof self) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code coverage complains about this check not beeing covered.
I'd say additional tests using plain sql strings
as parts instead of QueryBuilder instances should be added anyway, and would cover this line here two (guessing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test with a sql string as cte query
48c0cb7
to
82a9c20
Compare
I'm a bit confused why we're merging parameters from different QBs now. Didn't agree that we don't do that? |
So, yes, please remove the whole parameter merging stuff. It bloats and complicates this PR. Just as in subqueries and unions, parameters are managed on the outermost QB. |
I think there was a misunderstanding, that's why I've added this part |
In my opinion, the behavior of ignoring the parameters bound to subqueries is wrong, and #6525 is the evidence of that. The API allows binding parameters and ignores them, so the real behavior can be discovered only via testing. This is bad. If there was no precedent with the UNION queries, I wouldn't allow an API like this to be introduced. On the other hand, this PR won't break anything, it will introduce just another poor abstraction, so I'm not going to push back. |
* | ||
* @throws QueryException Setting an empty array as columns is not allowed. | ||
*/ | ||
public function with(string $name, string|QueryBuilder $part, ?array $columns = null): self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if the outer builder will ignore the parameters bound to the inner one, it shouldn't accept QueryBuilder
and should only accept string
? This way, the caller will have to call toSql()
on the inner builder and thereby acknowledge that its parameters won't be used.
This is my first contribution to this repository and I admit that I was missing some knowledge about this project and his conception, especially #6525 It would be great to decide first if the method |
That's okay. You wouldn't have gone through all the discussions back and forth if we had more consistent views between maintainers but both, Alexander and I have quite limited time that we can dedicate to the project, so this is a side effect of that. Please bear with us and sorry for such a rough experience.
This sounds like an option but still, we'll have to develop a plan forward. Let me outline the options that I'd be comfortable with. Option 1. The most puristicThe UNION design isn't the best, so before reusing it for other features, we can fix it. Make it respect the parameters bound to the inner builders (i.e., address #6525) and then add the support for CTEs with the same behavior. Option 2. The quickest oneAccept only strings for CTE expressions as proposed in #6621 (comment). This way, we will have the behavior consistent with the UNION queries but less confusing. Whether or not we make the same changes in the UNION is out of the scope. I'm fine with either and am open to other proposals. Depending on your @nio-dtp goals (whether you just want to contribute the support for CTEs or make other contributions to DBAL), please choose whatever works for you.
Not sure what you mean by "considered". Could you elaborate? |
I was trying to tell that calling
Between the 2 options, I'd prefer the first. But maybe we could think about a third option. An object could manage the query parameters for all the query builders instanced by a single connection instance. This commit is an example of such implementation, using a singleton. Tests are passing locally but I'm sure I've missed some points, as it is out the box. Tell me if you think It would be a good thing to work more further this solution. Or if we keep going with the first solution. |
The third option looks like over-complication. The connection should not be involved in the process of building a query, it's the job of the query builder. The fact that some queries are built from subqueries is an implementation detail that shouldn't leak outside of the builder.
We definitely could but the one proposed above doesn't look better than the others. |
Now I understand what you're saying but I'm not sure I agree. In my understanding, the state of a query builder should encapsulate the query and the values of the bound parameters. Having Let's assume the outer query uses positional parameters and two inner query internally. How is the user of the outer query supposed to address the parameters of the second inner query? Do they need to keep in mind the number of parameters in the outer query that precede the inner ones and the number of parameters in the first inner query? This looks unmanageable. |
I agree with that and I didn't found other way to address an object that manage all the parameters.
Assuming this, I'll address the Option 1, and first #6525
Ok you're right, I didn't imagine this use case |
@morozov I don't think that the back-and-forth on this PR is especially helpful. I understand that merging parameters might be desirable or feel more intuitive, but we have not done this yet. Not for subqueries, not for unions. This PR is about adding CTEs to the QB which is something we can deliver without merging parameters. Let's use #6525 to discuss if and how we want to merge parameters. This can be done in a follow-up, independently from the CTE feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this PR looks ready to be merged. I understand that not everybody is happy with not merging parameters of different query builders, but that's something we can discuss for a follow-up.
I'm fine with merging it as is. |
Thanks @nio-dtp ! |
Fixes #5018.
Summary
This pull request introduces support for Common Table Expressions (CTEs) across various database platforms and updates the
QueryBuilder
to utilize this feature.