Pick Bring back cdbsubselect_drop_distinct. etc #821

yjhjstz · 2024-12-30T01:12:41Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

This commit includes a couple improvements when dumping minirepros and gpsd: 1. Dump the number of segments. This is needed to replicate the plan, and often we have to ask users/customers this separately. Instead just retrieve it during the minirepro 2. Output a `set optimizer=off` for when the minirepro is being restored. The short insert statements are much faster (~10X) using planner than Orca, especially if done in debug buiild. Output this by default so I don't have to modify this manually.

The QE notice message might be truncated because MPPnoticeReceiver() limited message length about 800 bytes sent to the client. The client should not see an incomplete notice message. This commit changes its behaviour, it allows QE messages of any length (include QE identifier) to send to the client.

…. (#14108) In GPDB, expression subquery is promoted to join, e.g. select * from T where a > (select 10*avg(x) from R where T.b=R.y); in postgres, the plan is: QUERY PLAN ----------------------------------------------------------------- Seq Scan on t (cost=0.00..80364.30 rows=753 width=8) Filter: ((a)::numeric > (SubPlan 1)) SubPlan 1 -> Aggregate (cost=35.53..35.54 rows=1 width=32) -> Seq Scan on r (cost=0.00..35.50 rows=10 width=4) Filter: (t.b = y) while in GPDB, the subquery is promoted to join: QUERY PLAN --------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Hash Join (cost=477.00..969.11 rows=9567 width=8) Hash Cond: (t.b = "Expr_SUBQUERY".csq_c0) Join Filter: ((t.a)::numeric > "Expr_SUBQUERY".csq_c1) -> Seq Scan on t (cost=0.00..321.00 rows=28700 width=8) -> Hash (cost=472.83..472.83 rows=333 width=36) -> Subquery Scan on "Expr_SUBQUERY" -> HashAggregate Group Key: r.y -> Seq Scan on r Optimizer: Postgres query optimizer But if the T.b and R.y are different type, it will report error like: ERROR: operator 37 is not a valid ordering operator (pathkeys.c:579) The root cause is that IsCorrelatedEqualityOpExpr may generate incorrect sort operator and equal operator for SortGroupClause when the types of left and right operands of OpExpr are different. Since get_ordering_op_properties() requires the type of the left and right operands being consistent and SortGroupClause->tleSortGroupRef is referencing the target entry created from innerExpr, we teach IsCorrelatedEqualityOpExpr() construct appropriate eqOp and sortOp according to the type of innerExpr.

Previously, when using minirepro on a query that accessed materialized views, the object that the materialized view used/accessed were not dumped with the minirepro. This is because queries themselves treat materialized views as tables, not views. However, when enumerating the dependent objects for dumping the relevant DDL, we need to treat materialized views as regular views. This commit sets a flag when dumping from gp_dump_query_oids so that materialized views are treated as regular views when parsing the query.

This commit adds calls to `RelationOpenSmgr()` in `open_all_datumstreamread_segfiles()` and `openFetchSegmentFile()` to ensure that the storage manager (Smgr) relation is opened before accessing Append-Optimized (AO) segment files. Without explicitly opening the Smgr relation, accessing `rel->rd_smgr->smgr_ao` could lead to undefined behavior or errors when the Smgr is not initialized. This change ensures stability and correctness when working with AO segment files.

For GPDB5 pg_tablespace.spclocation is used For GPDB6+ pg_catalog.pg_tablespace_location(oid) is used Authored-by: Brent Doil <bdoil@vmware.com>

We don't drop partitioned tables with indexes for the pg_upgrade regression test on GPDB7. Remove the commented out commands and FIXME. Authored-by: Brent Doil <bdoil@vmware.com>

The same value is assigned to the variable at the moment of declaration (some lines before)

Co-authored-by: CharlieTT <chaotian@vmware.com> Fix crash of PG_TRY which caused by return branch in PG_TRY block. When we return value in PG_TRY, some setjump/longjump context has been messed up so that we cannot longjump to right address once ereport(ERROR) exists later, so crash will happen.

On PG12 an option SKIP_LOCKED was added to VACUUM and ANALYZE to indicate whether we need to skip tables when a lock cannot be acquired at the time. However, relevant tests don't cover all the scenarios on GPDB. This PR is to add more tests to cover the scenarios.

In GP's convert_ANY_sublink_to_join() now, there is no simplification of origin subquery, as a result sometimes the sublink cannot be pulled up, which produced a bad query plan. In GP5, there is a subquery simplify step which helped sublink pull-up, but we lose this simplification since version 6X until master, which caused the sublink pull-up doesn't work as only simple subquery is allowed to be pulled up. By this commit we brought back cdbsubselect_drop_distinct and cdbsubselect_drop_orderby, then we can got simplified subquery again and do the sublink pull-up optimization on it which produced a better plan.

The commit faa7dc0713 fix the fallback but forget to remove FIXME.

The case is to test some code path during parallel hash join but Greenplum does not support Postgres' parallel plan. Changes the FIXME to a NOTE.

Historically in GPDB6, when executing GRANT/REVOKE on a partition root, we will recurse into its child tables. We have decided to keep this behavior for 7X, but to also support the ONLY keyword so that users can choose to grant/revoke just the partition root, without recursion. The syntax is e.g.: GRANT SELECT ON ONLY foo_par TO role; Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA

Now the SQL does not fallback to planner.

The name of UpdateSerializableCommandId is a bit wrong, now that SERIALIZABLE and REPEATABLE READ are not the same. From comparison, the if-check above was changed from checking IsXactIsoLevelSerializable to IsolationUsesXactSnapshot(). This commit renames it and resolves that GPDB_91_MERGE_FIXME. commit 5eb15c9 Author: Joe Conway <mail@joeconway.com> Date: Sat Sep 11 18:38:58 2010 +0000 SERIALIZABLE transactions are actually implemented beneath the covers with transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.

Historically in GPDB6, when executing ALTER TABLE SET TABLESPACE on a partition root, we will recurse into its child tables. This behavior differs from the upstream where it is not recursed. We have decided to keep this behavior for 7X, but to also support the ONLY keyword so that users can choose to alter just the partition root. Similar to other ALTER TABLE ONLY statements, the syntax is e.g.: ALTER TABLE ONLY foopart SET TABLESPACE myts Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA

Gpload regression case apache#50 would fail, if OS user that installed gpdb cluster is not gpadmin. Since it specified a hardcode user gpadmin, we need to use PGUSER instead to fix it. Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>

Previous commit ebb691a03 is to fix ANY sublink contains distinct-on syntax on partial exprs in the target list of subselect. That fix method simply not to drop distinct or order-by clause and might lead to some sublink cannot be pulled up and bad performance, below is an example of such regression: select a from t where a in (select b from t1 group by b) For the above SQL, if we do not drop the group-by clause, then later planner will take the subselect as not simple and fail to pull it up to a join. This commit fixes the regression by: 1. revert the logic of ebb691a03 2. fix the issue #12656 using a correct way: if the distinct on clause's length is not the same as the length of subselect's targetlist, we do not drop the distinct clause. Authored-by: Jian Guo <gjian@vmware.com> Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>

yjhjstz · 2024-12-31T08:49:49Z

move to #827

chrishajas and others added 9 commits December 26, 2024 16:49

pg_upgrade: Resolve FIXME for spclocation column

df72e44

For GPDB5 pg_tablespace.spclocation is used For GPDB6+ pg_catalog.pg_tablespace_location(oid) is used Authored-by: Brent Doil <bdoil@vmware.com>

pg_upgrade: Resolve test FIXME

d5b451c

We don't drop partitioned tables with indexes for the pg_upgrade regression test on GPDB7. Remove the commented out commands and FIXME. Authored-by: Brent Doil <bdoil@vmware.com>

Removed unnecessary variable assignment.

60a26f7

The same value is assigned to the variable at the moment of declaration (some lines before)

yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 30, 2024

yaowangm and others added 8 commits December 30, 2024 09:29

Remove fixme in qp_correlated_query. (#14314)

c0d0585

The commit faa7dc0713 fix the fallback but forget to remove FIXME.

Change FIXME to note in join_hash case.

9f41153

The case is to test some code path during parallel hash join but Greenplum does not support Postgres' parallel plan. Changes the FIXME to a NOTE.

Remove FIXME in bfv_olap_optimizer.out. (#14319)

1c0a2ac

Now the SQL does not fallback to planner.

yjhjstz force-pushed the pick_1227 branch from cce0937 to f74bcbf Compare December 30, 2024 01:29

dreamedcheng and others added 2 commits December 30, 2024 10:48

yjhjstz force-pushed the pick_1227 branch from 0c07ad7 to 94e2d60 Compare December 30, 2024 02:50

yjhjstz closed this Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pick Bring back cdbsubselect_drop_distinct. etc #821

Pick Bring back cdbsubselect_drop_distinct. etc #821

yjhjstz commented Dec 30, 2024

yjhjstz commented Dec 31, 2024

Pick Bring back cdbsubselect_drop_distinct. etc #821

Pick Bring back cdbsubselect_drop_distinct. etc #821

Conversation

yjhjstz commented Dec 30, 2024

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

yjhjstz commented Dec 31, 2024