Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTE support to select in QueryBuilder #6621

Merged
merged 6 commits into from
Jan 5, 2025

Conversation

nio-dtp
Copy link

@nio-dtp nio-dtp commented Nov 21, 2024

Q A
Type feature

Fixes #5018.

Summary

This pull request introduces support for Common Table Expressions (CTEs) across various database platforms and updates the QueryBuilder to utilize this feature.

@morozov
Copy link
Member

morozov commented Nov 22, 2024

@nio-dtp, thanks for the PR.

As this is a new feature, please retarget against 4.3.x. We only accept bug fixes and upgrade path improvements in the 3.x series.

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a very basic unit test that would demonstrate the expected SQL and an integration test that would run the resulting query on the platforms that support CTE.

I'm curious to see how this will work with query parameters (both positional and named).

src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
@nio-dtp nio-dtp force-pushed the cte-support branch 2 times, most recently from 2936e39 to 686bf02 Compare November 23, 2024 07:58
@nio-dtp nio-dtp changed the base branch from 3.9.x to 4.3.x November 23, 2024 07:58
@nio-dtp nio-dtp force-pushed the cte-support branch 4 times, most recently from 79cf2fb to 1e545fc Compare November 23, 2024 20:35
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
@nio-dtp
Copy link
Author

nio-dtp commented Nov 23, 2024

@morozov Thanks for the review. I've added some tests and a supports method for the platforms.

I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7

@nio-dtp nio-dtp marked this pull request as ready for review November 23, 2024 21:05
@nio-dtp nio-dtp requested a review from morozov November 23, 2024 21:05
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
tests/Functional/Query/QueryBuilderTest.php Outdated Show resolved Hide resolved
@derrabus
Copy link
Member

I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7

We try to avoid those supportsX() methods in our platform classes and this one in particular looks like we want to remove it in 5.0. I think, we should take the optimistic approach here and assume that every platform supports CTEs with the standard syntax. This would also make this feature available immediately for 3rd-party drivers.

If we did that, what would be the worst this that could happen? MySQL 5.7 users will get a syntax error if and only if they try to run a query with a CTE. I could live with that.

If we really want a nicer error message for MySQL 5.7 users, we could move the getSQLForCTEs() method to the abstract platform and for MySQL 5.7 we override that method and raise an exception.

Either way: Please remove the supportsCTEs() methods everywhere. We don't need it. 🙂

@sbuerk
Copy link
Contributor

sbuerk commented Nov 24, 2024

Thanks @nio-dtp for providing this PR. ❤️ Love to see that other has the same needs and staying on the DBAL way to implement this. Just let me add my five cents to this.

First, I would remove the platform support cte methods as @derrabus already mentioned and asked for. In the end, we would end up with multiple support methods if we want to take care beforehand of support questions, because even a lot of vendors supports CTE in general, they have dirty little differences. I'm here more for let the database report if it does not understand something and that it needs to be done from the application then. DBAL should here only provide a generic way to let WITH (CTE) syntax in general created and used.

The second point I want to put a light on is the current implementation of the with() method of this PR. It simply adds additional parts with each call, which in general is fully fine. But there is no way to reset it. Instead of introducing a resetWith() method I would tend more adopt logic of select, which means having a with() methd which resets the internal array and adds the first with as the first part and a addWith() to add additional parts.

Regarding the multi part aspect I would question a third point. A with part can depend on one or more other parts, which requires that the parts a part depends on must be defined before that part:

WITH
 baseCTE AS (SELECT id, title FROM tableName),
 secondCTE AS (SELECT id, title FROM baseCTE where id < 1000)
SELECT * from secondCTE

If secondCTE would have been added before the baseCTE which would lead to a database error reported by the database. Here we should decide, if we want to help users of the QueryBuilder or not. Personally, I would add a automatic sorting for depending parts which means we need the depending setting which further encourage for point 2 to use two methods with() and addWith() to allow defining depending CTE part names with the added part.

public function with(string $name, string|\Stringable|self $part): self
{
  $this->with = [];
  $this->with[] = [ /* adding first part */ ];
}
public function addWith(string $name, string|\Stringable|self $part, string ...$dependsOn): self
{
  $this->with[] = [
    /* adding additional part */
  ];
}

The current implemetation would already allow to add a recursive CTE part in the current form which itself is a two part UNION query, where the first part is the initial query and the second part the recursive part, for example as pseudo SQL:

WITH RECURSIVE
recursiveCTE AS (
    -- initial query
    SELECT id, parentId, title FROM table_a WHERE parentId = 0

  UNION

    -- recursive query
    SELECT b.id, b.parentId, b.title
    FROM table_b AS b
    WHERE b.parentId > 0 AND b.parentId = recursiveCTE.id 

)
SELECT * FROM recursiveCTE

With recursive CTE the whole thing gets a little more nifty, as different vendors requires or allow lazyness on different levels. For example:

  • Some vendors requires that WITH RECURSIVE must be set in case at least one recursive part is in the chain, others allows to omit that or does not really care. Having it set for a query with a recursive part does not harm any database (as far as I have investigated it yet).
  • For recursive parts, the order of the fields between the initial and recusive *must be the same order and count, and also of the same type (some vendors are more sensible to that for example Postgres). For that, it is possible to also define the fieldnames within the cte part definition, which helps to keep track of the order and naming:
WITH RECURSIVE
recursiveCTE(virtual_id, virtual_title, virtual_parentid) AS (
    -- initial query
    SELECT
      id       AS virtual_id,
      title    AS virtual_title,
      parentId AS virtual_parent_id
    FROM table_a WHERE parentId = 0

  UNION

    -- recursive query
    SELECT
      b.id        AS virtual_id,
      b.title     AS virtual_title,
      b.parentId  AS virtual_parentid
    FROM table_b AS b
    WHERE b.parentId > 0 AND b.parentId = recursiveCTE.virtual_id 

)
SELECT * FROM recursiveCTE

That is not fully possible with the current implementation. So we could consider

  • having additionel methods withRecursive() and addWithRecursive() which tracks that a recursive part is added to ensure the WITH RECURSIVE syntax is added
  • or adding a type enum as argument for the with() and addWith() method.

In any case, the with() and addWith() should get a additional (optional) fields array to define the fieldnames for the (recursive) CTE part.

From a internal handling perspective, I suggest to introduce a internal class for cte parts similar to the internal Join part class along with the options required (name, field, depends, isRecursive etc) which would allow to omit a dedicated flag and later iterate the array and add the RECURSIVE keywoard as soon as at least one part has it set. Or a additional container around the parts could be implemented, which internal tracks that when a recursive part gets added. I would not make that class public API. With such a collection/part implemenation a sorting could also be implemented (with a exception if cycling has been etablished) on query building.

To be honest, I did not investigated yet which DBAL supported platforms allows RECURSIVE cte's on top of normal CTE's and which not and this was still outstanding.

I started a custom implementation a couple of months ago [1], but delayed it due to time constraints and implemented it as a custom solution within the TYPO3 decorated QueryBuilder (prefixed) as a test-baloon [2] as we needed it for the release after implementing a first (quite advanced) usage [3]

To summerize this, this PR is a good start but in my eyes not finished yet (without wanting to blame it), because in general it is the same I started with before adding the additional stuff. To be honest, in my working state I had also the support methods but already considered to drop them again before making a pull-request out of it.

I propose that we clarify first what Doctrine DBAL wants to support and what not out of all these things and than where to continue. Either (if @nio-dtp) is open to adopt it and continue or if I should polish and finish my work (which is alrady some steps further) and adopt to the decisions made (dependency sorting or not, internal class usage or not, ...)

Suggested method(s):

/**
 * @param non-empty-string[] $fields
 */
public function with(
  string $identifier,
  string|\Stringable|QueryBuilder $part,
  array $fields = [],
  WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}

/**
 * @param non-empty-string[] $dependsOn
 * @param non-empty-string[] $fields
 */
public function addWith(
  string $identifier,
  string|\Stringable|QueryBuilder $part,
  array $dependsOn = [],
  array $fields = [],
  WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}

Developers implementing a recursive CTE needs to use a dedicated QueryBuilder instance to build a union query using the union support to define it, if not done the database should report this as an issue and not tried to scan or throw a custom exception from doctrine.

Recursive CTE's have been the reason why I started and contributed the union support for the QueryBuilder as a preparation for providing a CTE implementation.

In my POC/WIP i extendted the src/SQL/Builder/DefaultSelectSQLBuilder.php to add the WITH SQL building support. That could also be considered if that is the better way or by doing it as additionel PRE rendering like in this change. I know my personal prefernce on that, but should also be considered decided here before going on with any change.

I just would not merge this one here to quickly before considering the aforementioned points, at least for a overall strategy and either making it directly or with followups.

src/Query/QueryBuilder.php Outdated Show resolved Hide resolved
@nio-dtp
Copy link
Author

nio-dtp commented Nov 25, 2024

Thanks you to all the reviewers and their detailed feedback, I'll rework my PR and make a more complete proposal.
This is my first contribution and I'm interested in achieve it.
I think I'll rework the code this way:

  • an array that will contains With Objects. Maybe I will first implement non recursive version of the CTE, but keeping it in mind for a future enhancement.
  • I think that the end user shall declare theirs CTEs in correct order (like he does in sql) and I guess this is not necessary maintaining a property that checks if a CTE is dependent on others. But maybe I've missed something ?
  • I'd like to separate the generation of the WithQuery and the SelectQuery as later we could more easily implement the WithQuery for an update or a delete request.
  • Also, as a general request, I'll not add the supportsCTE method 😆.

@morozov
Copy link
Member

morozov commented Nov 26, 2024

Suggested method(s):

I'd improve the following aspects:

  1. Naming. It's unclear from the names what the difference between with() and addWith() is. At least PHPDoc should clearly explain that.
  2. Too many optional parameters. Out of 5 parameters in addWith(), 3 are optional. It means that a single method can be used for too many different purposes.
  3. Do not expose enums like WithType::SIMPLE (especially, optional) to the end user. Make the intended behavior clear from the method name (e.g., addWithRecursive() is much clearer).

As for the "reset" method – what is the use case for it? It would basically allow to disregard the logic of a pre-built query. If it wasn't necessary, why declare it in the first place?

@sbuerk
Copy link
Contributor

sbuerk commented Nov 26, 2024

I talked with @derrabus on sunday about this and came up with following using 4 methods:

/**
 * @param string[] $fields
 */
public function with(
  string $name,
  string|\Stringable|self $part,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function addWith(
  string $name,
  string|\Stringable|self $part,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function withRecursive(
  string $name,
  string|\Stringable|self $initialOrUnionPart,
  string|\Stringable|self|null $recursivePart,
  array $fields = [],
): self {}

/**
 * @param string[] $fields
 */
public function addRecursive(
  string $name,
  string|\Stringable|self $initialOrUnionpart,
  string|\Stringable|self|null $recursivePart,
  array $fields = [],
): self {}

If a UNION query is passed as the first requried part for the recursive vairants, it is simply used. If both parts are passed, internally a union query should be created (using the union querybuilder api without allowing duplicates). Should be explained within the method phpdocblocks and by providing a concrete single part the developer has full power about the recursive union block.

to follow semantic and logic similar to select() / addSelect() or order() / addOrder().

with or withRecursive will reset the internal array and create a first element, whereas the addWith* methods adds an additionall entry to the internal array (DTO object).

The *Recursive() method should set a flag within the DTO object that recursive is needed.

We think that a dedicated resetWith() method is not needed as it does not make sense to transform a created CTE query to something else again at a later point, in contrast to a normal select query to create a count query out of it (and reseting group/order things etc).

It should be possible to define the fields for the CTE part, but having that optional:

WITH
 customCte(virtual_id, virtual_field)
   AS (SELECT id AS virtual_id, somefield AS virtual_field FROM sometable)
SELECT * from customCte

Regarding my point for the `depends, in special for the recursive CTE's we discussed and decided to not provide an API in this low-level implementation. Developers or frameworks (for example ORM or similar) should keep track on there own and add the parts in the required and correect order. No custom sorting or cylcing detection, will be reported by the database when executed.

We had no hard meaning for the internal implementation and building the query, so the current switch form may be okay. Personally, I think it would make more sense to move that into the DefaultSelectQueryBuilder and pass the with array (of DTO's) within the SelectQuery DTO. Not sure if that should count as breaking though or not.

with/addWith and withRecursive/addWithRecursive naming would follow the semantic Doctrine already has for the other parts, exceopt that the *Recursive variant makes it more clear. Sure, PHPDoc block needs to make that clear. In the end, it kind of follows the design approach of the QueryBuilder.

And it is noticable that the top (most outer) querybuilder instance (in most cases that instance where the with/addWith/withRecursive/addWithRecursive methods are used) needs to be used for creating named placeholders (parameters). That matches the same requriement and flow as it is needed for using QueryBuilder to create sub queries or for the union support already and can therefore be taken as expectable. (Remark to #6621 (comment))

Taking this, we could make this one a first implementation for the with/addWith part and adding the recursive support later on - or doing both in one go.

My question here is, if @nio-dtp wants to update and work on this and has the time for it the next time. Otherwise I would rebase and finish my work in my fork and provide an additional PR in the next two weeks (which was a start after an original pitch and discussion with @derrabus but not added as PR(draft pr) yet due to time constraints).

I'm totally fine not to do anything and test/verify/review this later on but also being fine finishing mine (because of access) and mention @nio-dtp as co-author then.

@nio-dtp
Copy link
Author

nio-dtp commented Nov 26, 2024

If it is ok for all, I'll make a proposal by the end of the week without the recursive part.

@derrabus
Copy link
Member

I talked with @derrabus on sunday about this and came up with following using 4 methods:

A small remark on this: I would remove the Stringable from those signatures. We don't allow stringables anywhere else on the query builder and I don't think we should start allowing them here.

@sbuerk
Copy link
Contributor

sbuerk commented Nov 27, 2024

If it is ok for all, I'll make a proposal by the end of the week without the recursive part.

Thanks @nio-dtp, and that is pretty fine. I will add the recursive part afterwards in a follow up PR.

@sbuerk
Copy link
Contributor

sbuerk commented Dec 9, 2024

Going to review/test this after day job today.

Copy link
Contributor

@sbuerk sbuerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR still needs some work.

Regarding the let's start with merging parameters from sub-query-builder-instances:

I get the point, basically, but in my eye's this "first" incomplete implementation for the new CTE leads to more confusion than the current state.

  • Until know the top/most-outer QueryBuilder instance needs to be used when a SQL query is composed using nested (multiple) QueryBuilder instances, for example as calculated select() field or for where/having/ordering or sub-select tables to join/from
  • union() don't merge the placeholders also for QueryBuilder instances.
  • Using plain SQL parts needs to set it in the top/most outer instance again.
  • Wrapping within expressions (ExpressionBuilder) cannot hold a dedicated sub instance and needs to be converted to SQL already, which means that it needs to set it on the outer instance (top/most outer instance) already.

I think, until the "merging" feature can be ensured throughout the whole API with nesting parts, this leads to more issues and confusion than the current "use the top/most-outer" part.

Specially the "merging" and thus overriding the values for auto-named placeholders is hard to debug for people not that deep into the Doctrine DBAL implementation (casual user / developers).

Personally, I would prefere to remove that again for now and make it in a dedicated overall-concept and API(signature) changes instead of a "partly" working solution. As I'm not a maintainer I cannot block it anyway, but at least the "support" in the current incomplete form should at least get additional tests to either show-case the short-commings and merging the same named placeholder from different instances (with differnt values?) should get some thoughts how to handle these - as there will be no error from the database, just simply weired results which are not that obvious in most cases.

If the merging is kept and okay for the Doctrine DBAL maintainers, we should really quickly add some kind of merging logic at least to the union()/addUnion() API to increase that support (and prepare towards the recursive CTE parts requireing UNION and needs to merge that along with the with parts - otherwise it will be broken in between.


private function platformSupportsCTEColumnDefinition(): bool
{
return $this->connection->getDatabasePlatform() instanceof SQLitePlatform;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite is not the only platform allowing CTE column definition - why is this limited to SQLite only ?

WITH cte_a(col_a, col_b) AS(....) SELECT * FROM cte_a

I know for sure that this works for:

  • MySQL8.0+
  • MariaDB 10.4.3+
  • PostgresSQL 10+
  • SQLite

as we are already testing this within TYPO3 with the "custom prefixed" solution in our decoracted QueryBuilder (with some auto quoting stuff).

If tests are failing because of "colums" there must be another reason for it, and not that it generally does not work:

https://github.com/TYPO3/typo3/blob/c53cf9d80934687cf1a54bdd98bc2dde1dcb8e87/typo3/sysext/core/Tests/Functional/Database/Query/QueryBuilder/CommonTableExpression/SimpleSelectCommonTableExpressionTest.php#L27-L82

    #[Test]
    public function twoSimpleValueCommonTableExpressionsJoinedReturnsExpectedResult(): void
    {
        // # --
        // # Builds following SQL (for MySQL - other databases varies)
        // # --
        //
        //    WITH
        //
        //        cte1 (a, b) AS (
        //            SELECT
        //                1 AS `a`,
        //                'value-a' AS `b`
        //        ),
        //
        //        cte2 (c, d) AS (
        //            SELECT
        //                1 AS `c`,
        //                'value-c' AS `d`
        //        )
        //
        //    SELECT
        //        `a` AS `id`,
        //        `b` AS `value1`,
        //        `d` AS `value2`
        //    FROM `cte1`
        //    INNER JOIN `cte2` `cte2` ON (cte1.a = cte2.c)
        $expectedRows = [
            ['id' => 1, 'value1' => 'value-a', 'value2' => 'value-c'],
        ];
        $connection = $this->getConnectionPool()->getConnectionByName('Default');
        $selectQueryBuilder = $connection->createQueryBuilder();
        $expr = $selectQueryBuilder->expr();
        $simpleValueListQueryBuilder1 = $connection->createQueryBuilder();
        $simpleValueListQueryBuilder1
            ->selectLiteral(
                $expr->as('1', $selectQueryBuilder->quoteIdentifier('a')),
                $expr->as($selectQueryBuilder->quote('value-a'), 'b'),
            );
        $simpleValueListQueryBuilder2 = $connection->createQueryBuilder();
        $simpleValueListQueryBuilder2
            ->selectLiteral(
                $expr->as('1', 'c'),
                $expr->as($selectQueryBuilder->quote('value-c'), 'd'),
            );
        $selectQueryBuilder
            ->typo3_with('cte1', $simpleValueListQueryBuilder1, ['a', 'b'])
            ->typo3_addWith('cte2', $simpleValueListQueryBuilder2, ['c', 'd'])
            ->selectLiteral(
                $expr->as($selectQueryBuilder->quoteIdentifier('a'), 'id'),
                $expr->as($selectQueryBuilder->quoteIdentifier('b'), 'value1'),
                $expr->as($selectQueryBuilder->quoteIdentifier('d'), 'value2'),
            )
            ->from('cte1')
            ->join('cte1', 'cte2', 'cte2', '(cte1.a = cte2.c)');
        //$sql = $selectQueryBuilder->getSQL();
        self::assertSame($expectedRows, $selectQueryBuilder->executeQuery()->fetchAllAssociative());
    }

which is tested against MySQL 8.0.17+, MariaDB 10.4.3+, PostgresSQL 10+ and SQLite.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had a wrong syntax that was resulting in error for some platforms

WITH cte_a(col_a, col_b) AS(....) SELECT * FROM cte_a

Space is necessary between cte name and columns list

WITH cte_a (col_a, col_b) AS(....) SELECT * FROM cte_a

Tests should be ok now for MySQL 8.0.17+, MariaDB 10.4.3+, PostgresSQL 10+ and SQLite

{
$platform = $this->connection->getDatabasePlatform();

return ! $platform instanceof MySQLPlatform || $platform instanceof MySQL80Platform;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read and easy to get wrong. Can we use parenthesis here to make it clear what the check is ?

https://3v4l.org/VoZEa

Suggested change
return ! $platform instanceof MySQLPlatform || $platform instanceof MySQL80Platform;
return (! $platform instanceof MySQLPlatform) || $platform instanceof MySQL80Platform;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these parenthesis.

continue;
}

$cteParams = array_merge($cteParams, $cte->query->params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides named placehoders, if two part QueryBuilder instances auto creates named placeholders, specially when QueryBuilder->createNamedParameter() are used with automatic name generation based on a counter:

        $cteQueryBuilder1 = $this->connection->createQueryBuilder();
        $cteQueryBuilder1->select('id AS virtual_id')
            ->from('for_update')
            ->where('virtual_id = ' . $cteQueryBuilder1->createNamedParameter(123, ParameterType::INTEGER));

        $cteQueryBuilder2 = $this->connection->createQueryBuilder();
        $cteQueryBuilder2->select('id AS virtual_id')
            ->from('cte_a')
            ->where('virtual_id = ' . $cteQueryBuilder2->createNamedParameter(987, ParameterType::INTEGER));

would merge them in a weired way. Additionally, the tests related to named placeholder only covers manual defining them with setting parameter, not using the create methods along with autogeneration and using in two sub-querybuilder instances which gets merged.

/**
* Creates a new named parameter and bind the value $value to it.
*
* This method provides a shortcut for {@see Statement::bindValue()}
* when using prepared statements.
*
* The parameter $value specifies the value that you want to bind. If
* $placeholder is not provided createNamedParameter() will automatically
* create a placeholder for you. An automatic placeholder will be of the
* name ':dcValue1', ':dcValue2' etc.
*
* Example:
* <code>
* $value = 2;
* $q->eq( 'id', $q->createNamedParameter( $value ) );
* $stmt = $q->executeQuery(); // executed with 'id = 2'
* </code>
*
* @link http://www.zetacomponents.org
*
* @param string|null $placeHolder The name to bind with. The string must start with a colon ':'.
*
* @return string the placeholder name used.
*/
public function createNamedParameter(
mixed $value,
string|ParameterType|Type|ArrayParameterType $type = ParameterType::STRING,
?string $placeHolder = null,
): string {
if ($placeHolder === null) {
$this->boundCounter++;
$placeHolder = ':dcValue' . $this->boundCounter;
}
$this->setParameter(substr($placeHolder, 1), $value, $type);
return $placeHolder;
}

Tests related to the placeholder merging and usage is not sufficient enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not work with default (auto) placeholder because of the boundCounter value which start to 1 for each CTE.
But it is ok when setting placeholder

->where($cteQueryBuilder1->expr()->eq(
    'id',
    $cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id1'),
));

->where($cteQueryBuilder2->expr()->in(
    'id',
    $cteQueryBuilder2->createNamedParameter([1, 2], ArrayParameterType::INTEGER, ':id2'),
));     

I've added some tests for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My whole point is, that starting this leads to more things some developers needs to know beside a simple "use the top QueryBuilder" instance. Not a fan of starting this and leave it half way broken and unfinished. But that is more a personal taste:

->where($cteQueryBuilder1->expr()->eq(
    'id',
    $cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id'),
));

->where($cteQueryBuilder2->expr()->in(
    'id',
    $cteQueryBuilder2->createNamedParameter([1, 2], ArrayParameterType::INTEGER, ':id'),
));  

In that case database error should occor and be a little bit of help that something is off when merging the CTE parts on query execution.

->where($cteQueryBuilder1->expr()->eq(
    'id',
    $cteQueryBuilder1->createNamedParameter(1, ParameterType::INTEGER, ':id'),
));

->where($cteQueryBuilder2->expr()->in(
    'id',
    $cteQueryBuilder2->createNamedParameter(2, ParameterType::INTEGER, ':id'),
));  

would merge that and use 2 for both places for the same :id without any notices - and will give a hard time to debug and find this.

Askin me the whole merging should vanish again until it could be implemented completly in all parts and with a concept behind it. But that is not my decision.

Let's see what @derrabus would say.

Copy link
Author

@nio-dtp nio-dtp Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it could give a hard time to debug this. We could prevent this by adding a check during the merge process and reject a parameter's name that is already used.


foreach ($this->commonTableExpressions as $cte) {
if (! $cte->query instanceof self) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code coverage complains about this check not beeing covered.

I'd say additional tests using plain sql strings as parts instead of QueryBuilder instances should be added anyway, and would cover this line here two (guessing).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test with a sql string as cte query

@nio-dtp nio-dtp force-pushed the cte-support branch 2 times, most recently from 48c0cb7 to 82a9c20 Compare December 11, 2024 07:19
@nio-dtp
Copy link
Author

nio-dtp commented Dec 21, 2024

Hi @derrabus @morozov
Should I rework this PR ?

@derrabus
Copy link
Member

derrabus commented Dec 21, 2024

I'm a bit confused why we're merging parameters from different QBs now. Didn't agree that we don't do that?

@derrabus
Copy link
Member

So, yes, please remove the whole parameter merging stuff. It bloats and complicates this PR. Just as in subqueries and unions, parameters are managed on the outermost QB.

@nio-dtp
Copy link
Author

nio-dtp commented Dec 21, 2024

I'm a bit confused why we're merging parameters from different QBs now. Didn't agree that we don't do that?

I think there was a misunderstanding, that's why I've added this part
#6621 (comment)
I'll rollback it 😕

@nio-dtp nio-dtp requested review from morozov and sbuerk December 21, 2024 17:10
@morozov
Copy link
Member

morozov commented Dec 21, 2024

In my opinion, the behavior of ignoring the parameters bound to subqueries is wrong, and #6525 is the evidence of that. The API allows binding parameters and ignores them, so the real behavior can be discovered only via testing. This is bad.

If there was no precedent with the UNION queries, I wouldn't allow an API like this to be introduced. On the other hand, this PR won't break anything, it will introduce just another poor abstraction, so I'm not going to push back.

*
* @throws QueryException Setting an empty array as columns is not allowed.
*/
public function with(string $name, string|QueryBuilder $part, ?array $columns = null): self
Copy link
Member

@morozov morozov Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if the outer builder will ignore the parameters bound to the inner one, it shouldn't accept QueryBuilder and should only accept string? This way, the caller will have to call toSql() on the inner builder and thereby acknowledge that its parameters won't be used.

@nio-dtp
Copy link
Author

nio-dtp commented Dec 22, 2024

This is my first contribution to this repository and I admit that I was missing some knowledge about this project and his conception, especially #6525
I agree with @morozov
I think it is non intuitive to allow to bind and define parameters from a subquery but ignore this binding when the main query is executed.
Maybe we should put on standby this PR ? I'm not satisfy to merge changes with such different positions.

It would be great to decide first if the method setParameter(s) could be considered by (all) subqueries too or only by the main query.
First option would be an interesting but hard development.

@morozov
Copy link
Member

morozov commented Dec 22, 2024

This is my first contribution to this repository and I admit that I was missing some knowledge about this project and his conception, especially #6525

That's okay. You wouldn't have gone through all the discussions back and forth if we had more consistent views between maintainers but both, Alexander and I have quite limited time that we can dedicate to the project, so this is a side effect of that. Please bear with us and sorry for such a rough experience.

Maybe we should put on standby this PR ? I'm not satisfy to merge changes with such different positions.

This sounds like an option but still, we'll have to develop a plan forward. Let me outline the options that I'd be comfortable with.

Option 1. The most puristic

The UNION design isn't the best, so before reusing it for other features, we can fix it. Make it respect the parameters bound to the inner builders (i.e., address #6525) and then add the support for CTEs with the same behavior.

Option 2. The quickest one

Accept only strings for CTE expressions as proposed in #6621 (comment). This way, we will have the behavior consistent with the UNION queries but less confusing. Whether or not we make the same changes in the UNION is out of the scope.

I'm fine with either and am open to other proposals. Depending on your @nio-dtp goals (whether you just want to contribute the support for CTEs or make other contributions to DBAL), please choose whatever works for you.

It would be great to decide first if the method setParameter(s) could be considered by (all) subqueries too or only by the main query.
First option would be an interesting but hard development.

Not sure what you mean by "considered". Could you elaborate?

@nio-dtp
Copy link
Author

nio-dtp commented Jan 2, 2025

Not sure what you mean by "considered". Could you elaborate?

I was trying to tell that calling setParameters should really set parameters for all the queries, even for the subqueries used in where or select. This is not what is happening.

Option 1. The most puristic

The UNION design isn't the best, so before reusing it for other features, we can fix it. Make it respect the parameters bound to the inner builders (i.e., address #6525) and then add the support for CTEs with the same behavior.

Option 2. The quickest one

Accept only strings for CTE expressions as proposed in #6621 (comment). This way, we will have the behavior consistent with the UNION queries but less confusing. Whether or not we make the same changes in the UNION is out of the scope.

Between the 2 options, I'd prefer the first.

But maybe we could think about a third option. An object could manage the query parameters for all the query builders instanced by a single connection instance.
Parameters and types of all query builders could be merge just before executing the query.

This commit is an example of such implementation, using a singleton. Tests are passing locally but I'm sure I've missed some points, as it is out the box. Tell me if you think It would be a good thing to work more further this solution.

Or if we keep going with the first solution.

@morozov
Copy link
Member

morozov commented Jan 3, 2025

The third option looks like over-complication. The connection should not be involved in the process of building a query, it's the job of the query builder. The fact that some queries are built from subqueries is an implementation detail that shouldn't leak outside of the builder.

But maybe we could think about a third option.

We definitely could but the one proposed above doesn't look better than the others.

@morozov
Copy link
Member

morozov commented Jan 3, 2025

I was trying to tell that calling setParameters should really set parameters for all the queries, even for the subqueries used in where or select.

Now I understand what you're saying but I'm not sure I agree. In my understanding, the state of a query builder should encapsulate the query and the values of the bound parameters. Having setParameters() called on an outer query influence the state of the inner one will break this encapsulation.

Let's assume the outer query uses positional parameters and two inner query internally. How is the user of the outer query supposed to address the parameters of the second inner query? Do they need to keep in mind the number of parameters in the outer query that precede the inner ones and the number of parameters in the first inner query? This looks unmanageable.

@nio-dtp
Copy link
Author

nio-dtp commented Jan 3, 2025

The connection should not be involved in the process of building a query, it's the job of the query builder.

I agree with that and I didn't found other way to address an object that manage all the parameters.

We definitely could but the one proposed above doesn't look better than the others.

Assuming this, I'll address the Option 1, and first #6525

Let's assume the outer query uses positional parameters and two inner query internally. How is the user of the outer query supposed to address the parameters of the second inner query? Do they need to keep in mind the number of parameters in the outer query that precede the inner ones and the number of parameters in the first inner query? This looks unmanageable.

Ok you're right, I didn't imagine this use case

@derrabus
Copy link
Member

derrabus commented Jan 4, 2025

@morozov I don't think that the back-and-forth on this PR is especially helpful. I understand that merging parameters might be desirable or feel more intuitive, but we have not done this yet. Not for subqueries, not for unions. This PR is about adding CTEs to the QB which is something we can deliver without merging parameters.

Let's use #6525 to discuss if and how we want to merge parameters. This can be done in a follow-up, independently from the CTE feature.

Copy link
Member

@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this PR looks ready to be merged. I understand that not everybody is happy with not merging parameters of different query builders, but that's something we can discuss for a follow-up.

@morozov
Copy link
Member

morozov commented Jan 5, 2025

I'm fine with merging it as is.

@greg0ire greg0ire merged commit 4d8badd into doctrine:4.3.x Jan 5, 2025
68 checks passed
@greg0ire
Copy link
Member

greg0ire commented Jan 5, 2025

Thanks @nio-dtp !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants