Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG17 compatibility: add/fix tests with correlated subqueries that can be pulled to a join #7745

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Nov 14, 2024

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in PG17 #7741

The test failures are caused by this commit in PG17, which enables correlated subqueries to be pulled up to a join. Prior to this, the correlated subquery was implemented as a subplan. In citus, it is not possible to pushdown a correlated subplan, but with a different plan in PG17 the query can be executed, per the test diff from subquery_in_where:

37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM (SELECT intermediate_result.events_user_id, intermediate_result.events_time, intermediate_result.event_type FROM read_intermediate_result('XXX_1'::text, 'binary'::citus_copy_format) intermediate_result(events_user_id integer, events_time timestamp without time zone, event_type integer)) event_id WHERE (events_user_id OPERATOR(pg_catalog.=) ANY (SELECT users_table.user_id FROM public.users_table WHERE (users_table."time" OPERATOR(pg_catalog.=) event_id.events_time)))
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 

This is because with pg17 = ANY subquery in the queries can be implemented as a join, instead of as a subplan filter on a table scan. For example, SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2 (from set_operations) has this plan in pg17; note that the subquery is the inner side of a nested loop join:

┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘

and this plan in pg16 (and previous pg versions); the subquery is a correlated subplan filter on a table scan:

┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘

The fix Modifies the queries causing the test failures so that an ANY subquery is not folded to a join, preserving the expected output of the tests. A similar approach was taken for existing regress tests in the postgres commit. See the join regress test, for example.

@colm-mchugh colm-mchugh self-assigned this Nov 14, 2024
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@0fed87a). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             release-13.0    #7745   +/-   ##
===============================================
  Coverage                ?   89.64%           
===============================================
  Files                   ?      274           
  Lines                   ?    59583           
  Branches                ?     7436           
===============================================
  Hits                    ?    53413           
  Misses                  ?     4037           
  Partials                ?     2133           

@@ -134,7 +134,7 @@ SELECT * FROM test a WHERE x NOT IN (SELECT x FROM test b WHERE y = 1 UNION SELE
SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c) ORDER BY 1,2;

-- correlated subquery with union in WHERE clause
SELECT * FROM test a WHERE x IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2;
SELECT * FROM test a WHERE (x + random()) IN (SELECT x FROM test b UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2;
Copy link
Member

@naisila naisila Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar approach was taken for existing regress tests in the postgres commit.

They followed this approach in the postgres tests because they were having EXPLAIN diffs, and they wanted to avoid adding a new alternative test output file for PG17. In Citus, note that in these two tests, we are trying to run the query, not to explain it. So, we try to run these queries, both of them unexpectedly work.

My point is, we also need to understand what changed in the Citus planner path, in the codebase, and make sure that Citus is running these queries correctly.

Current fix is great, by the way, no extra output file, but we may need to test this more extensively in Citus through this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think that the Citus planner is running the queries correctly (in pg17) because it is getting a different plan from the pg planner, but I will verify, and see what tests can be added (maybe to pg17 regress test?) to test the new behavior in pg17.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that sounds great.

maybe to pg17 regress test

Yes, makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest push contains a pg17 regress test that tests the pg17 feature of pulling up correlated ANY subqueries. It can be extended to test other 17-related functionality as appropriate.

@naisila
Copy link
Member

naisila commented Nov 14, 2024

By the way, can we add a similar fix to dml_recursive test to avoid the extra output? #7727

@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch from 0cb74b8 to 1a6ef7c Compare November 14, 2024 19:09
@colm-mchugh
Copy link
Contributor Author

By the way, can we add a similar fix to dml_recursive test to avoid the extra output? #7727

Yes, it looks like dml_recursive can have a similar fix. In all three cases - set_operations, subquery_in_where and dml_recursive - the plan created by the Postgres planner pre-pg17 implemented the correlated subquery as a SubPlan filter. In all three cases with pg17 the pg optimizer can fold the correlated subquery to a join, so the pg plan does not have any correlated SubPlans, which seems to avoid the limitations in Citus.

@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch from 1a6ef7c to 7b7d2d0 Compare November 15, 2024 18:06
Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR, thank you.
We can merge after the team sync on the queries that work, given that we don't discover any issues in that meeting.

@colm-mchugh
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I was holding off on merging is that I forgot to test the PR with PG16 😆Sorry about that.
So, I realized we miss pg17_0.sql file, which is the alternative file output for pg16/pg15/pg14 runs. Thats why I am requesting changes with this PR review.
Also, check-style test is failing, it looks like there are some whitespaces around https://github.com/citusdata/citus/actions/runs/11861227624/job/33058148915?pr=7745

Additionally, I really like that you provided the query version rewritten with subquery pulled up to a join, which Citus can execute in all PG versions. So, I was thinking, we can include these outputs in pg17_0.sql file
Usually pgxx_0.sql file only has the following lines as we don't execute in previous versions:

--
-- PG16
--
SHOW server_version \gset
SELECT substring(:'server_version', '\d+')::int >= 16 AS server_version_ge_16
\gset
\if :server_version_ge_16
\else
\q

However, we might let it execute in this case. What do you think?

@colm-mchugh
Copy link
Contributor Author

colm-mchugh commented Nov 18, 2024

The reason why I was holding off on merging is that I forgot to test the PR with PG16 😆Sorry about that. So, I realized we miss pg17_0.sql file, which is the alternative file output for pg16/pg15/pg14 runs. Thats why I am requesting changes with this PR review. Also, check-style test is failing, it looks like there are some whitespaces around https://github.com/citusdata/citus/actions/runs/11861227624/job/33058148915?pr=7745

Ah, I was not aware of the pgxx_0.sql convention, let me address, and also check-style

However, we might let it execute in this case. What do you think?

I think that's reasonable! (include queries that Citus can run with pg < pg17)

@naisila naisila force-pushed the naisila/pg17_support branch from 1cf690f to e12686a Compare November 18, 2024 15:11
@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch 3 times, most recently from 7214cf7 to bd082e1 Compare November 18, 2024 17:40
@naisila naisila force-pushed the naisila/pg17_support branch from e12686a to 46dc966 Compare November 19, 2024 09:27
@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch from bd082e1 to 7cf76b8 Compare November 19, 2024 16:09
Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for merging to me, thanks! A couple of things:

  • check-style is failing because of some whitespaces I think
  • before merging we need to change the base PR to release-13.0

@naisila naisila changed the title Fix Test Failure in subquery_in_where, set_operations in PG17 (#7741) PG17 compatibility: add/fix tests with correlated subqueries that can be pulled to a join Nov 19, 2024
@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch from 7cf76b8 to f5a98b9 Compare November 19, 2024 21:05
@colm-mchugh
Copy link
Contributor Author

colm-mchugh commented Nov 19, 2024

  • check-style is failing because of some whitespaces I think

Fixed; forgot to run after changing the test table population

  • before merging we need to change the base PR to release-13.0

Just want to sanity-check how change the base PR to release-13.0 is done; is it:

git checkout dev-branch
git rebase -i --onto release-13.0 naisila/pg17_support dev-branch
< drop irrelevant commits >
git push -f

?

Thanks!

@colm-mchugh colm-mchugh changed the base branch from naisila/pg17_support to release-13.0 November 19, 2024 21:30
@colm-mchugh colm-mchugh changed the base branch from release-13.0 to naisila/pg17_support November 19, 2024 21:31
@colm-mchugh colm-mchugh changed the base branch from naisila/pg17_support to release-13.0 November 20, 2024 09:23
…in PG17 (#7741)

Change the queries causing the test failures so that the ANY subquery
cannot be pulled up to a join, preserving the expected output of the test.

Add pg17 regress test for correlated ANY subqueries that can be folded
to a join in pg17, and for testing other pg17 features as required.
@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-set_operations branch from f5a98b9 to 60b9ff7 Compare November 20, 2024 09:40
@colm-mchugh
Copy link
Contributor Author

The PR has been rebased to release-13.0, should be good to merge pending any relevant checks

@naisila naisila merged commit 680c23f into release-13.0 Nov 20, 2024
121 checks passed
@naisila naisila deleted the cmchugh/pg17-set_operations branch November 20, 2024 11:51
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can
be pulled up to a join with PG17 (#7745)
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can
be pulled up to a join with PG17 (#7745)
colm-mchugh added a commit that referenced this pull request Nov 22, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
… be pulled to a join (#7745)

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in
PG17 #7741

The test failures are caused by[ this commit in
PG17](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639),
which enables correlated subqueries to be pulled up to a join. Prior to
this, the correlated subquery was implemented as a subplan. In citus, it
is not possible to pushdown a correlated subplan, but with a different
plan in PG17 the query can be executed, per the test diff from
`subquery_in_where`:

```
37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM ...
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 
```

This is because with pg17 `= ANY subquery` in the queries can be
implemented as a join, instead of as a subplan filter on a table scan.
For example, `SELECT * FROM test a WHERE x IN (SELECT x FROM test b
UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2` (from
set_operations) has this plan in pg17; note that the subquery is the
inner side of a nested loop join:
```
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘
```
and this plan in pg16 (and previous pg versions); the subquery is a
correlated subplan filter on a table scan:
```
┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘
```

The fix Modifies the queries causing the test failures so that an ANY
subquery is not folded to a join, preserving the expected output of the
tests. A similar approach was taken for existing regress tests in the[
postgres
commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639).
See the `join `regress test, for example.

We also add pg17 specific tests that leverage this improvement in Postgres
with Citus distributed planning as well.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
… be pulled to a join (#7745)

Fix Test Failure in subquery_in_where, set_operations, dml_recursive in
PG17 #7741

The test failures are caused by[ this commit in
PG17](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639),
which enables correlated subqueries to be pulled up to a join. Prior to
this, the correlated subquery was implemented as a subplan. In citus, it
is not possible to pushdown a correlated subplan, but with a different
plan in PG17 the query can be executed, per the test diff from
`subquery_in_where`:

```
37,39c37,41
< DEBUG:  generating subplan XXX_1 for CTE event_id: SELECT user_id AS events_user_id, "time" AS events_time, event_type FROM public.events_table
< DEBUG:  Plan XXX query after replacing subqueries and CTEs: SELECT count(*) AS count FROM ...
< ERROR:  correlated subqueries are not supported when the FROM clause contains a CTE or subquery
---
>  count
> ---------------------------------------------------------------------
>      0
> (1 row)
> 
```

This is because with pg17 `= ANY subquery` in the queries can be
implemented as a join, instead of as a subplan filter on a table scan.
For example, `SELECT * FROM test a WHERE x IN (SELECT x FROM test b
UNION SELECT y FROM test c WHERE a.x = c.x) ORDER BY 1,2` (from
set_operations) has this plan in pg17; note that the subquery is the
inner side of a nested loop join:
```
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Sort                                              │
│   Sort Key: a.x, a.y                              │
│   ->  Nested Loop                                 │
│         ->  Seq Scan on test a                    │
│         ->  Subquery Scan on "ANY_subquery"       │
│               Filter: (a.x = "ANY_subquery".x)    │
│               ->  HashAggregate                   │
│                     Group Key: b.x                │
│                     ->  Append                    │
│                           ->  Seq Scan on test b  │
│                           ->  Seq Scan on test c  │
│                                 Filter: (a.x = x) │
└───────────────────────────────────────────────────┘
```
and this plan in pg16 (and previous pg versions); the subquery is a
correlated subplan filter on a table scan:
```
┌───────────────────────────────────────────────┐
│                  QUERY PLAN                   │
├───────────────────────────────────────────────┤
│ Sort                                          │
│   Sort Key: a.x, a.y                          │
│   ->  Seq Scan on test a                      │
│         Filter: (SubPlan 1)                   │
│         SubPlan 1                             │
│           ->  HashAggregate                   │
│                 Group Key: b.x                │
│                 ->  Append                    │
│                       ->  Seq Scan on test b  │
│                       ->  Seq Scan on test c  │
│                             Filter: (a.x = x) │
└───────────────────────────────────────────────┘
```

The fix Modifies the queries causing the test failures so that an ANY
subquery is not folded to a join, preserving the expected output of the
tests. A similar approach was taken for existing regress tests in the[
postgres
commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639).
See the `join `regress test, for example.

We also add pg17 specific tests that leverage this improvement in Postgres
with Citus distributed planning as well.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Preserve the test error message by adjusting the query so that PG17
cannot pull it up to a join. Another instance of a subquery that can be
pulled up to a join with PG17 (#7745)

This should have been fixed in, but slipped by, #7745
naisila added a commit that referenced this pull request Dec 24, 2024
This is the final commit that adds
PG17 compatibility with Citus's current capabilities.

You can use Citus community, release-13.0 branch, with PG17.1.

---------

Specifically, this commit:

- Enables PG17 in the configure script.

- Adds PG17 tests to CI using test images that have 17.1

- Fixes an upgrade test: see below for details
In `citus_prepare_upgrade()`, don't drop any_value when upgrading from
PG16+, because PG16+ has its own any_value function. Attempting to do so
results in the error seen in [pg16-pg17
upgrade](https://github.com/citusdata/citus/actions/runs/11768444117/job/32778340003?pr=7661):
```
ERROR:  cannot drop function any_value(anyelement) because it is required by the database system
CONTEXT:  SQL statement "DROP AGGREGATE IF EXISTS pg_catalog.any_value(anyelement)"
```
When 16 becomes the minimum supported Postgres version, the drop
statements can be removed.

---------

Several PG17 Compatibility commits have been merged before this final one.
All these subtasks are done #7653

See the list below:

Compilation PR: #7699
Ruleutils PR: #7725
Sister PR for tests: citusdata/the-process#159

Helpful smaller PRs:
- #7714
- #7726
- #7731
- #7732
- #7733
- #7738
- #7745
- #7747
- #7748
- #7749
- #7752
- #7755
- #7757
- #7759
- #7760
- #7761
- #7762
- #7765
- #7766
- #7768
- #7769
- #7771
- #7774
- #7776
- #7780
- #7781
- #7785
- #7788
- #7793
- #7796

---------

Co-authored-by: Colm <colmmchugh@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

📘Fix missing ERROR: cannot push down this subquery in set_operations and subquery_in_where
2 participants