Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG17 compatibility: fix plan diffs in multi_explain #7780

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Dec 6, 2024

Regress test multi_explain has two queries that have a different query plan with PG17. Here is part of the plan diff for the query labelled Union and left join subquery pushdown in multi_explain.sql (for the complete diff, search for multi_explain here):

                                       ->  Sort
                                             Sort Key: ((users.composite_id).tenant_id), ((users.composite_id).user_id), subquery_2.hasdone, events.event_time
-                                            ->  Hash Left Join
-                                                  Hash Cond: (users.composite_id = subquery_2.composite_id)
-                                                  ->  HashAggregate
-                                                        Group Key: ((users.composite_id).tenant_id), ((users.composite_id).user_id), users.composite_id, ('action=>1'::text), events.event_time
+                                            ->  Nested Loop Left Join
+                                                  Join Filter: (users.composite_id = subquery_2.composite_id)
+                                                  ->  Unique
+                                                        ->  Sort
+                                                              Sort Key: ((users.composite_id).tenant_id), ((users.composite_id).user_id), users.composite_id, ('action=>1'::text), events.event_time
                                                               ->  Append

The change is the same in both queries; a hash left join with subquery_1 on the outer and subquery_2 on the inner side of the join is now a nested loop left join with subquery_1 on the outer and subquery_2 on the inner; additionally, the chosen method of uniquifying the UNION in subquery_1 has changed from hashed grouping to sort followed by unique, as shown in the diff above.

The PG17 commit that caused this plan change is likely Fix MergeAppend to more accurately compute the number of rows that need to be sorted because it impacts the estimated rows counts of UNION paths. Comparing a costed plan of the query between PG16 and PG17 I noticed that with PG16 the rows estimate for the UNION in subquery_1 is 4, whereas with PG17 the rows estimate is 2. A lower rows estimate in the outer side of the join may result in nested loop looking cheaper than hash join for the left outer join, hence the plan change in the two queries where there is a UNION on the outer side of a left outer join.

The proposed fix achieves a consistent plan across all supported postgres versions by temporarily disabling nested loop join and sort for the two impacted queries; the postgres optimizer selects hash join for the outer left join and hashed aggregation for the UNION operation. I investigated tweaking the queries, but was not able to arrive at a consistent plan, and I believe the SQL operator (e.g. join, group by, union) implementations are orthogonal to the intent of the test, so this should be a satisfactory solution, particularly as it avoids introducing a second alternative output file for multi_explain.

@colm-mchugh colm-mchugh self-assigned this Dec 6, 2024
Copy link

codecov bot commented Dec 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-13.0@1c2e784). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             release-13.0    #7780   +/-   ##
===============================================
  Coverage                ?   89.65%           
===============================================
  Files                   ?      274           
  Lines                   ?    59592           
  Branches                ?     7436           
===============================================
  Hits                    ?    53427           
  Misses                  ?     4035           
  Partials                ?     2130           

Copy link
Member

@naisila naisila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, given that we mostly care about the complex subquery being pushed down in these queries.
Yet again avoiding the alternative test output, so great!

Reminder to rebase to release-13.0, and drop the enable configure commit before merging.

@colm-mchugh colm-mchugh changed the base branch from naisila/pg17_support to release-13.0 December 17, 2024 18:14
@colm-mchugh colm-mchugh force-pushed the cmchugh/pg17-multi_explain-plan branch from c0baef0 to 7f7af62 Compare December 17, 2024 18:15
@colm-mchugh colm-mchugh merged commit 89e0b53 into release-13.0 Dec 17, 2024
120 of 121 checks passed
@colm-mchugh colm-mchugh deleted the cmchugh/pg17-multi_explain-plan branch December 17, 2024 21:42
naisila added a commit that referenced this pull request Dec 24, 2024
This is the final commit that adds
PG17 compatibility with Citus's current capabilities.

You can use Citus community, release-13.0 branch, with PG17.1.

---------

Specifically, this commit:

- Enables PG17 in the configure script.

- Adds PG17 tests to CI using test images that have 17.1

- Fixes an upgrade test: see below for details
In `citus_prepare_upgrade()`, don't drop any_value when upgrading from
PG16+, because PG16+ has its own any_value function. Attempting to do so
results in the error seen in [pg16-pg17
upgrade](https://github.com/citusdata/citus/actions/runs/11768444117/job/32778340003?pr=7661):
```
ERROR:  cannot drop function any_value(anyelement) because it is required by the database system
CONTEXT:  SQL statement "DROP AGGREGATE IF EXISTS pg_catalog.any_value(anyelement)"
```
When 16 becomes the minimum supported Postgres version, the drop
statements can be removed.

---------

Several PG17 Compatibility commits have been merged before this final one.
All these subtasks are done #7653

See the list below:

Compilation PR: #7699
Ruleutils PR: #7725
Sister PR for tests: citusdata/the-process#159

Helpful smaller PRs:
- #7714
- #7726
- #7731
- #7732
- #7733
- #7738
- #7745
- #7747
- #7748
- #7749
- #7752
- #7755
- #7757
- #7759
- #7760
- #7761
- #7762
- #7765
- #7766
- #7768
- #7769
- #7771
- #7774
- #7776
- #7780
- #7781
- #7785
- #7788
- #7793
- #7796

---------

Co-authored-by: Colm <colmmchugh@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants