PG17 compatibility: Preserve DEBUG output in cte_inline #7755

colm-mchugh · 2024-11-19T17:11:17Z

Regression test cte_inline has the following diff;

DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: CTE cte_1 is going to be inlined via distributed planning 
DEBUG: Creating router plan 
-DEBUG: query has a single distribution column value: 1

DEBUG message query has a single distribution column value does not appear with PG17. This is because PG17 can recognize when a Result node does not need to have an input node, so the predicate on the distribution column is not present in the query plan. Comparing the query plan obtained before PG17:

┌────────────────────────────────────────────────────────────────────────────────┐
│                                   QUERY PLAN                                   │
├────────────────────────────────────────────────────────────────────────────────┤
│ Result                                                                         │
│   One-Time Filter: false                                                       │
│   ->  GroupAggregate                                                           │
│         ->  Seq Scan on public.test_table                                      │
│               Filter: (test_table.key = 1)                                     │
└────────────────────────────────────────────────────────────────────────────────┘

with the PG17 query plan:

┌──────────────────────────────────┐
│            QUERY PLAN            │
├──────────────────────────────────┤
│ Result                           │
│   One-Time Filter: false         │
└──────────────────────────────────┘

we see that the Result node in the PG16 plan has an Aggregate node, but the Result node in the PG17 plan does not have any input node; PG17 recognizes it is not needed given a Filter that evaluates to False at compile-time. The Result node is present in both plans because PG in both versions can recognize when a combination of predicates equate to false at compile time; this is the because the successive predicates in the test query (key=6, key=5, key=4, etc) become contradictory when the CTEs are inlined. Here is an example query showing the effect of the CTE inlining:

select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key;

In this case, the WHERE clause obviously evaluates to False. The PG16 query plan for this query is:

┌────────────────────────────────────┐
│             QUERY PLAN             │
├────────────────────────────────────┤
│ GroupAggregate                     │
│   ->  Result                       │
│         One-Time Filter: false     │
│         ->  Seq Scan on test_table │
│               Filter: (key = 1)    │
└────────────────────────────────────┘

The PG17 query plan is:

┌────────────────────────────────┐
│           QUERY PLAN           │
├────────────────────────────────┤
│ GroupAggregate                 │
│   ->  Result                   │
│         One-Time Filter: false │
└────────────────────────────────┘

In both plans the PG optimizer is able to derive the predicate 1=2 from the equivalence class { key, 1, 2 } and then constant fold this to False. But, in the PG16 plan the Result node has an input node (a sequential scan on test_table), while in the PG17 plan the Result node does not have any input. This is because PG17 recognizes that when the Result filter resolves to False at compile time it is not necessary to set an input on the Result. I think this is a consequence of this PG17 commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440
which handles redundant IS [NOT] NULL predicates, but also refactored evaluating of predicates to true/false at compile-time, enabling optimizations such as those seen here.

Given the reason for the diff, the fix preserves the test output by modifying the query so the predicates are not contradictory when the CTEs are inlined.

Fixes #7754

codecov · 2024-11-19T17:59:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@b29ecd1). Learn more about missing BASE report.

Additional details and impacted files

@@               Coverage Diff               @@
##             release-13.0    #7755   +/-   ##
===============================================
  Coverage                ?   89.65%           
===============================================
  Files                   ?      274           
  Lines                   ?    59584           
  Branches                ?     7436           
===============================================
  Hits                    ?    53418           
  Misses                  ?     4032           
  Partials                ?     2134

naisila

Nice fix, thanks.
Let's change base branch to release-13.0 and merge.

Preserve DEBUG output in cte_inline

Regression test cte_inline has the following diff; ``` DEBUG: CTE cte_1 is going to be inlined via distributed planning DEBUG: CTE cte_1 is going to be inlined via distributed planning DEBUG: Creating router plan -DEBUG: query has a single distribution column value: 1 ``` DEBUG message `query has a single distribution column value` does not appear with PG17. This is because PG17 can recognize when a Result node does not need to have an input node, so the predicate on the distribution column is not present in the query plan. Comparing the query plan obtained before PG17: ``` │ Result │ │ One-Time Filter: false │ │ -> GroupAggregate │ │ -> Seq Scan on public.test_table │ │ Filter: (test_table.key = 1) │ ``` with the PG17 query plan: ``` ┌──────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────┤ │ Result │ │ One-Time Filter: false │ └──────────────────────────────────┘ ``` we see that the Result node in the PG16 plan has an Aggregate node, but the Result node in the PG17 plan does not have any input node; PG17 recognizes it is not needed given a Filter that evaluates to False at compile-time. The Result node is present in both plans because PG in both versions can recognize when a combination of predicates equate to false at compile time; this is the because the successive predicates in the test query (key=6, key=5, key=4, etc) become contradictory when the CTEs are inlined. Here is an example query showing the effect of the CTE inlining: ``` select count(*), key FROM test_table WHERE key = 1 AND key = 2 GROUP BY key; ``` In this case, the WHERE clause obviously evaluates to False. The PG16 query plan for this query is: ``` ┌────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────┤ │ GroupAggregate │ │ -> Result │ │ One-Time Filter: false │ │ -> Seq Scan on test_table │ │ Filter: (key = 1) │ └────────────────────────────────────┘ ``` The PG17 query plan is: ``` ┌────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────┤ │ GroupAggregate │ │ -> Result │ │ One-Time Filter: false │ └────────────────────────────────┘ ``` In both plans the PG optimizer is able to derive the predicate 1=2 from the equivalence class { key, 1, 2 } and then constant fold this to False. But, in the PG16 plan the Result node has an input node (a sequential scan on test_table), while in the PG17 plan the Result node does not have any input. This is because PG17 recognizes that when the Result filter resolves to False at compile time it is not necessary to set an input on the Result. I think this is a consequence of this PG17 commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b262ad440 which handles redundant IS [NOT] NULL predicates, but also refactored evaluating of predicates to true/false at compile-time, enabling optimizations such as those seen here. Given the reason for the diff, the fix preserves the test output by modifying the query so the predicates are not contradictory when the CTEs are inlined.

This is the final commit that adds PG17 compatibility with Citus's current capabilities. You can use Citus community, release-13.0 branch, with PG17.1. --------- Specifically, this commit: - Enables PG17 in the configure script. - Adds PG17 tests to CI using test images that have 17.1 - Fixes an upgrade test: see below for details In `citus_prepare_upgrade()`, don't drop any_value when upgrading from PG16+, because PG16+ has its own any_value function. Attempting to do so results in the error seen in [pg16-pg17 upgrade](https://github.com/citusdata/citus/actions/runs/11768444117/job/32778340003?pr=7661): ``` ERROR: cannot drop function any_value(anyelement) because it is required by the database system CONTEXT: SQL statement "DROP AGGREGATE IF EXISTS pg_catalog.any_value(anyelement)" ``` When 16 becomes the minimum supported Postgres version, the drop statements can be removed. --------- Several PG17 Compatibility commits have been merged before this final one. All these subtasks are done #7653 See the list below: Compilation PR: #7699 Ruleutils PR: #7725 Sister PR for tests: citusdata/the-process#159 Helpful smaller PRs: - #7714 - #7726 - #7731 - #7732 - #7733 - #7738 - #7745 - #7747 - #7748 - #7749 - #7752 - #7755 - #7757 - #7759 - #7760 - #7761 - #7762 - #7765 - #7766 - #7768 - #7769 - #7771 - #7774 - #7776 - #7780 - #7781 - #7785 - #7788 - #7793 - #7796 --------- Co-authored-by: Colm <colmmchugh@microsoft.com>

colm-mchugh requested review from onurctirtir, m3hm3t and naisila November 19, 2024 17:11

colm-mchugh self-assigned this Nov 19, 2024

colm-mchugh linked an issue Nov 19, 2024 that may be closed by this pull request

Fix missing DEBUG: query has a single distribution column value: 1, seen in cte_inline #7754

Closed

naisila changed the title ~~Preserve DEBUG output in cte_inline~~ PG17Preserve DEBUG output in cte_inline Nov 19, 2024

naisila changed the title ~~PG17Preserve DEBUG output in cte_inline~~ PG17 compatibility: Preserve DEBUG output in cte_inline Nov 19, 2024

naisila approved these changes Nov 19, 2024

View reviewed changes

naisila changed the base branch from naisila/pg17_support to release-13.0 November 19, 2024 20:24

PG17.0 Support - Regression tests sanity.

33b610c

Preserve DEBUG output in cte_inline

naisila force-pushed the cmchugh/pg17-cte_inline branch from 4bb05b9 to 33b610c Compare November 19, 2024 20:26

naisila merged commit 0fed87a into release-13.0 Nov 19, 2024
121 checks passed

naisila deleted the cmchugh/pg17-cte_inline branch November 19, 2024 21:14

This was referenced Nov 19, 2024

PG17.1 Support - Regression tests sanity #7653

Closed

Adds PG17.1 support - Regression tests sanity #7661

Merged

naisila added the pg17_support label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

colm-mchugh commented Nov 19, 2024 •

edited by naisila

Loading

codecov bot commented Nov 19, 2024 •

edited

Loading

naisila left a comment

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

PG17 compatibility: Preserve DEBUG output in cte_inline #7755

Conversation

colm-mchugh commented Nov 19, 2024 • edited by naisila Loading

codecov bot commented Nov 19, 2024 • edited Loading

Codecov Report

naisila left a comment

Choose a reason for hiding this comment

colm-mchugh commented Nov 19, 2024 •

edited by naisila

Loading

codecov bot commented Nov 19, 2024 •

edited

Loading