Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Nov 19, 2024

Regression test cte_inline has the following diff;

DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 

DEBUG message query has a single distribution column value does not appear with PG17. This is because PG17 can recognize when a Result node does not need to have an input node, so the predicate on the distribution column is not present in the query plan. Comparing the query plan obtained before PG17:

┌────────────────────────────────────────────────────────────────────────────────┐
│                                   QUERY PLAN                                   │
├────────────────────────────────────────────────────────────────────────────────┤
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │
└────────────────────────────────────────────────────────────────────────────────┘

with the PG17 query plan:

┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘

we see that the Result node in the PG16 plan has an Aggregate node, but the Result node in the PG17 plan does not have any input node; PG17 recognizes it is not needed given a Filter that evaluates to False at compile-time. The Result node is present in both plans because PG in both versions can recognize when a combination of predicates equate to false at compile time; this is the because the successive predicates in the test query (key=6, key=5, key=4, etc) become contradictory when the CTEs are inlined. Here is an example query showing the effect of the CTE inlining:

select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;

In this case, the WHERE clause obviously evaluates to False. The PG16 query plan for this query is:

┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘

The PG17 query plan is:

┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘

In both plans the PG optimizer is able to derive the predicate 1=2 from the equivalence class { key, 1, 2 } and then constant fold this to False. But, in the PG16 plan the Result node has an input node (a sequential scan on test_table), while in the PG17 plan the Result node does not have any input. This is because PG17 recognizes that when the Result filter resolves to False at compile time it is not necessary to set an input on the Result. I think this is a consequence of this PG17 commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored evaluating of predicates to true/false at compile-time, enabling optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by modifying the query so the predicates are not contradictory when the CTEs are inlined.

Fixes #7754

Copy link

codecov bot commented Nov 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@b29ecd1). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             release-13.0    #7755   +/-   ##
===============================================
  Coverage                ?   89.65%           
===============================================
  Files                   ?      274           
  Lines                   ?    59584           
  Branches                ?     7436           
===============================================
  Hits                    ?    53418           
  Misses                  ?     4032           
  Partials                ?     2134           
---- 🚨 Try these New Features:

@naisila naisila changed the title Preserve DEBUG output in cte_inline PG17Preserve DEBUG output in cte_inline Nov 19, 2024
@naisila naisila changed the title PG17Preserve DEBUG output in cte_inline PG17 compatibility: Preserve DEBUG output in cte_inline Nov 19, 2024
Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thanks.
Let's change base branch to release-13.0 and merge.

@naisila naisila changed the base branch from naisila/pg17_support to release-13.0 November 19, 2024 20:24
Preserve DEBUG output in cte_inline
@naisila naisila merged commit 0fed87a into release-13.0 Nov 19, 2024
121 checks passed
@naisila naisila deleted the cmchugh/pg17-cte_inline branch November 19, 2024 21:14
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Regression test cte_inline has the following diff;
```
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 
```
DEBUG message `query has a single distribution column value` does not
appear with PG17. This is because PG17 can recognize when a Result node
does not need to have an input node, so the predicate on the
distribution column is not present in the query plan. Comparing the
query plan obtained before PG17:
```
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │

```
with the PG17 query plan:
```
┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘
```
we see that the Result node in the PG16 plan has an Aggregate node, but
the Result node in the PG17 plan does not have any input node; PG17
recognizes it is not needed given a Filter that evaluates to False at
compile-time. The Result node is present in both plans because PG in
both versions can recognize when a combination of predicates equate to
false at compile time; this is the because the successive predicates in
the test query (key=6, key=5, key=4, etc) become contradictory when the
CTEs are inlined. Here is an example query showing the effect of the CTE
inlining:
```
select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;
```
In this case, the WHERE clause obviously evaluates to False. The PG16
query plan for this query is:
```
┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘
```
The PG17 query plan is:
```
┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘
```
In both plans the PG optimizer is able to derive the predicate 1=2 from
the equivalence class { key, 1, 2 } and then constant fold this to
False. But, in the PG16 plan the Result node has an input node (a
sequential scan on test_table), while in the PG17 plan the Result node
does not have any input. This is because PG17 recognizes that when the
Result filter resolves to False at compile time it is not necessary to
set an input on the Result. I think this is a consequence of this PG17
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored
evaluating of predicates to true/false at compile-time, enabling
optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by
modifying the query so the predicates are not contradictory when the
CTEs are inlined.
m3hm3t pushed a commit that referenced this pull request Nov 28, 2024
Regression test cte_inline has the following diff;
```
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1 
```
DEBUG message `query has a single distribution column value` does not
appear with PG17. This is because PG17 can recognize when a Result node
does not need to have an input node, so the predicate on the
distribution column is not present in the query plan. Comparing the
query plan obtained before PG17:
```
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │

```
with the PG17 query plan:
```
┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘
```
we see that the Result node in the PG16 plan has an Aggregate node, but
the Result node in the PG17 plan does not have any input node; PG17
recognizes it is not needed given a Filter that evaluates to False at
compile-time. The Result node is present in both plans because PG in
both versions can recognize when a combination of predicates equate to
false at compile time; this is the because the successive predicates in
the test query (key=6, key=5, key=4, etc) become contradictory when the
CTEs are inlined. Here is an example query showing the effect of the CTE
inlining:
```
select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;
```
In this case, the WHERE clause obviously evaluates to False. The PG16
query plan for this query is:
```
┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘
```
The PG17 query plan is:
```
┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘
```
In both plans the PG optimizer is able to derive the predicate 1=2 from
the equivalence class { key, 1, 2 } and then constant fold this to
False. But, in the PG16 plan the Result node has an input node (a
sequential scan on test_table), while in the PG17 plan the Result node
does not have any input. This is because PG17 recognizes that when the
Result filter resolves to False at compile time it is not necessary to
set an input on the Result. I think this is a consequence of this PG17
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored
evaluating of predicates to true/false at compile-time, enabling
optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by
modifying the query so the predicates are not contradictory when the
CTEs are inlined.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix missing DEBUG: query has a single distribution column value: 1, seen in cte_inline
2 participants