Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Nov 19, 2024

Regression test cte_inline has the following diff;

DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 

DEBUG message query has a single distribution column value does not appear with PG17. This is because PG17 can recognize when a Result node does not need to have an input node, so the predicate on the distribution column is not present in the query plan. Comparing the query plan obtained before PG17:

┌────────────────────────────────────────────────────────────────────────────────┐
│                                   QUERY PLAN                                   │
├────────────────────────────────────────────────────────────────────────────────┤
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │
└────────────────────────────────────────────────────────────────────────────────┘

with the PG17 query plan:

┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘

we see that the Result node in the PG16 plan has an Aggregate node, but the Result node in the PG17 plan does not have any input node; PG17 recognizes it is not needed given a Filter that evaluates to False at compile-time. The Result node is present in both plans because PG in both versions can recognize when a combination of predicates equate to false at compile time; this is the because the successive predicates in the test query (key=6, key=5, key=4, etc) become contradictory when the CTEs are inlined. Here is an example query showing the effect of the CTE inlining:

select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;

In this case, the WHERE clause obviously evaluates to False. The PG16 query plan for this query is:

┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘

The PG17 query plan is:

┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘

In both plans the PG optimizer is able to derive the predicate 1=2 from the equivalence class { key, 1, 2 } and then constant fold this to False. But, in the PG16 plan the Result node has an input node (a sequential scan on test_table), while in the PG17 plan the Result node does not have any input. This is because PG17 recognizes that when the Result filter resolves to False at compile time it is not necessary to set an input on the Result. I think this is a consequence of this PG17 commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored evaluating of predicates to true/false at compile-time, enabling optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by modifying the query so the predicates are not contradictory when the CTEs are inlined.

Fixes #7754

Copy link

codecov bot commented Nov 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@b29ecd1). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             release-13.0    #7755   +/-   ##
===============================================
  Coverage                ?   89.65%           
===============================================
  Files                   ?      274           
  Lines                   ?    59584           
  Branches                ?     7436           
===============================================
  Hits                    ?    53418           
  Misses                  ?     4032           
  Partials                ?     2134           

@naisila naisila changed the title Preserve DEBUG output in cte_inline PG17Preserve DEBUG output in cte_inline Nov 19, 2024
@naisila naisila changed the title PG17Preserve DEBUG output in cte_inline PG17 compatibility: Preserve DEBUG output in cte_inline Nov 19, 2024
Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thanks.
Let's change base branch to release-13.0 and merge.

@naisila naisila changed the base branch from naisila/pg17_support to release-13.0 November 19, 2024 20:24
Preserve DEBUG output in cte_inline
@naisila naisila force-pushed the cmchugh/pg17-cte_inline branch from 4bb05b9 to 33b610c Compare November 19, 2024 20:26
@naisila naisila merged commit 0fed87a into release-13.0 Nov 19, 2024
121 checks passed
@naisila naisila deleted the cmchugh/pg17-cte_inline branch November 19, 2024 21:14
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Regression test cte_inline has the following diff;
```
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 
```
DEBUG message `query has a single distribution column value` does not
appear with PG17. This is because PG17 can recognize when a Result node
does not need to have an input node, so the predicate on the
distribution column is not present in the query plan. Comparing the
query plan obtained before PG17:
```
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │

```
with the PG17 query plan:
```
┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘
```
we see that the Result node in the PG16 plan has an Aggregate node, but
the Result node in the PG17 plan does not have any input node; PG17
recognizes it is not needed given a Filter that evaluates to False at
compile-time. The Result node is present in both plans because PG in
both versions can recognize when a combination of predicates equate to
false at compile time; this is the because the successive predicates in
the test query (key=6, key=5, key=4, etc) become contradictory when the
CTEs are inlined. Here is an example query showing the effect of the CTE
inlining:
```
select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;
```
In this case, the WHERE clause obviously evaluates to False. The PG16
query plan for this query is:
```
┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘
```
The PG17 query plan is:
```
┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘
```
In both plans the PG optimizer is able to derive the predicate 1=2 from
the equivalence class { key, 1, 2 } and then constant fold this to
False. But, in the PG16 plan the Result node has an input node (a
sequential scan on test_table), while in the PG17 plan the Result node
does not have any input. This is because PG17 recognizes that when the
Result filter resolves to False at compile time it is not necessary to
set an input on the Result. I think this is a consequence of this PG17
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored
evaluating of predicates to true/false at compile-time, enabling
optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by
modifying the query so the predicates are not contradictory when the
CTEs are inlined.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Regression test cte_inline has the following diff;
```
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 
```
DEBUG message `query has a single distribution column value` does not
appear with PG17. This is because PG17 can recognize when a Result node
does not need to have an input node, so the predicate on the
distribution column is not present in the query plan. Comparing the
query plan obtained before PG17:
```
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │

```
with the PG17 query plan:
```
┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘
```
we see that the Result node in the PG16 plan has an Aggregate node, but
the Result node in the PG17 plan does not have any input node; PG17
recognizes it is not needed given a Filter that evaluates to False at
compile-time. The Result node is present in both plans because PG in
both versions can recognize when a combination of predicates equate to
false at compile time; this is the because the successive predicates in
the test query (key=6, key=5, key=4, etc) become contradictory when the
CTEs are inlined. Here is an example query showing the effect of the CTE
inlining:
```
select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;
```
In this case, the WHERE clause obviously evaluates to False. The PG16
query plan for this query is:
```
┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘
```
The PG17 query plan is:
```
┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘
```
In both plans the PG optimizer is able to derive the predicate 1=2 from
the equivalence class { key, 1, 2 } and then constant fold this to
False. But, in the PG16 plan the Result node has an input node (a
sequential scan on test_table), while in the PG17 plan the Result node
does not have any input. This is because PG17 recognizes that when the
Result filter resolves to False at compile time it is not necessary to
set an input on the Result. I think this is a consequence of this PG17
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored
evaluating of predicates to true/false at compile-time, enabling
optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by
modifying the query so the predicates are not contradictory when the
CTEs are inlined.
naisila added a commit that referenced this pull request Dec 24, 2024
This is the final commit that adds
PG17 compatibility with Citus's current capabilities.

You can use Citus community, release-13.0 branch, with PG17.1.

---------

Specifically, this commit:

- Enables PG17 in the configure script.

- Adds PG17 tests to CI using test images that have 17.1

- Fixes an upgrade test: see below for details
In `citus_prepare_upgrade()`, don't drop any_value when upgrading from
PG16+, because PG16+ has its own any_value function. Attempting to do so
results in the error seen in [pg16-pg17
upgrade](https://github.com/citusdata/citus/actions/runs/11768444117/job/32778340003?pr=7661):
```
ERROR:  cannot drop function any_value(anyelement) because it is required by the database system
CONTEXT:  SQL statement "DROP AGGREGATE IF EXISTS pg_catalog.any_value(anyelement)"
```
When 16 becomes the minimum supported Postgres version, the drop
statements can be removed.

---------

Several PG17 Compatibility commits have been merged before this final one.
All these subtasks are done #7653

See the list below:

Compilation PR: #7699
Ruleutils PR: #7725
Sister PR for tests: citusdata/the-process#159

Helpful smaller PRs:
- #7714
- #7726
- #7731
- #7732
- #7733
- #7738
- #7745
- #7747
- #7748
- #7749
- #7752
- #7755
- #7757
- #7759
- #7760
- #7761
- #7762
- #7765
- #7766
- #7768
- #7769
- #7771
- #7774
- #7776
- #7780
- #7781
- #7785
- #7788
- #7793
- #7796

---------

Co-authored-by: Colm <colmmchugh@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix missing DEBUG: query has a single distribution column value: 1, seen in cte_inline
2 participants