Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick Bring back cdbsubselect_drop_distinct. etc #821

Closed
wants to merge 19 commits into from

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Dec 30, 2024

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


chrishajas and others added 9 commits December 26, 2024 16:49
This commit includes a couple improvements when dumping minirepros and gpsd:

1. Dump the number of segments. This is needed to replicate the plan,
and often we have to ask users/customers this separately. Instead just
retrieve it during the minirepro
2. Output a `set optimizer=off` for when the minirepro is being
restored. The short insert statements are much faster (~10X) using
planner than Orca, especially if done in debug buiild. Output this by
default so I don't have to modify this manually.
The QE notice message might be truncated because MPPnoticeReceiver()
limited message length about 800 bytes sent to the client. The client
should not see an incomplete notice message.

This commit changes its behaviour, it allows QE messages of any length
(include QE identifier) to send to the client.
…. (#14108)

In GPDB, expression subquery is promoted to join, e.g.
select * from T where a > (select 10*avg(x) from R where T.b=R.y);
in postgres, the plan is:

                          QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on t  (cost=0.00..80364.30 rows=753 width=8)
   Filter: ((a)::numeric > (SubPlan 1))
   SubPlan 1
     ->  Aggregate  (cost=35.53..35.54 rows=1 width=32)
           ->  Seq Scan on r  (cost=0.00..35.50 rows=10 width=4)
                 Filter: (t.b = y)

while in GPDB, the subquery is promoted to join:

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Hash Join  (cost=477.00..969.11 rows=9567 width=8)
         Hash Cond: (t.b = "Expr_SUBQUERY".csq_c0)
         Join Filter: ((t.a)::numeric > "Expr_SUBQUERY".csq_c1)
         ->  Seq Scan on t  (cost=0.00..321.00 rows=28700 width=8)
         ->  Hash  (cost=472.83..472.83 rows=333 width=36)
               ->  Subquery Scan on "Expr_SUBQUERY"
                     ->  HashAggregate
                           Group Key: r.y
                           ->  Seq Scan on r
 Optimizer: Postgres query optimizer

But if the T.b and R.y are different type, it will report error like:
ERROR:  operator 37 is not a valid ordering operator (pathkeys.c:579)

The root cause is that IsCorrelatedEqualityOpExpr may generate incorrect sort
operator and equal operator for SortGroupClause when the types of left and right
operands of OpExpr are different.

Since get_ordering_op_properties() requires the type of the left and right
operands being consistent and SortGroupClause->tleSortGroupRef is referencing
the target entry created from innerExpr, we teach IsCorrelatedEqualityOpExpr()
construct appropriate eqOp and sortOp according to the type of innerExpr.
Previously, when using minirepro on a query that accessed materialized
views, the object that the materialized view used/accessed were not
dumped with the minirepro. This is because queries themselves treat
materialized views as tables, not views. However, when enumerating the
dependent objects for dumping the relevant DDL, we need to treat
materialized views as regular views.

This commit sets a flag when dumping from gp_dump_query_oids so that
materialized views are treated as regular views when parsing the query.
This commit adds calls to `RelationOpenSmgr()` in `open_all_datumstreamread_segfiles()` and `openFetchSegmentFile()` to ensure that the storage manager (Smgr) relation is opened before accessing Append-Optimized (AO) segment files.

Without explicitly opening the Smgr relation, accessing `rel->rd_smgr->smgr_ao` could lead to undefined behavior or errors when the Smgr is not initialized. This change ensures stability and correctness when working with AO segment files.
For GPDB5 pg_tablespace.spclocation is used
For GPDB6+ pg_catalog.pg_tablespace_location(oid) is used

Authored-by: Brent Doil <bdoil@vmware.com>
We don't drop partitioned tables with indexes for the pg_upgrade regression test on GPDB7.
Remove the commented out commands and FIXME.

Authored-by: Brent Doil <bdoil@vmware.com>
The same value is assigned to the variable at the moment
of declaration (some lines before)
Co-authored-by: CharlieTT <chaotian@vmware.com>

Fix crash of PG_TRY which caused by return branch in PG_TRY block. When we return value in PG_TRY, some
setjump/longjump context has been messed up so that we cannot longjump to right address once ereport(ERROR)
exists later, so crash will happen.
@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 30, 2024
yaowangm and others added 8 commits December 30, 2024 09:29
On PG12 an option SKIP_LOCKED was added to VACUUM and ANALYZE to indicate
whether we need to skip tables when a lock cannot be acquired at the time.
However, relevant tests don't cover all the scenarios on GPDB. This PR is
to add more tests to cover the scenarios.
In GP's convert_ANY_sublink_to_join() now, there is no simplification of
origin subquery, as a result sometimes the sublink cannot be pulled up,
which produced a bad query plan. In GP5, there is a subquery simplify
step which helped sublink pull-up, but we lose this simplification since
version 6X until master, which caused the sublink pull-up doesn't work as
only simple subquery is allowed to be pulled up. By this commit we brought
back cdbsubselect_drop_distinct and cdbsubselect_drop_orderby, then we can
got simplified subquery again and do the sublink pull-up optimization on it
which produced a better plan.
The commit faa7dc0713 fix the fallback but forget to remove
FIXME.
The case is to test some code path during parallel hash join but
Greenplum does not support Postgres' parallel plan. Changes the FIXME
to a NOTE.
Historically in GPDB6, when executing GRANT/REVOKE on a partition
root, we will recurse into its child tables. We have decided to
keep this behavior for 7X, but to also support the ONLY keyword
so that users can choose to grant/revoke just the partition root,
without recursion. The syntax is e.g.:

GRANT SELECT ON ONLY foo_par TO role;

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
Now the SQL does not fallback to planner.
The name of UpdateSerializableCommandId is a bit wrong, now that SERIALIZABLE
and REPEATABLE READ are not the same. From comparison, the if-check above was
changed from checking IsXactIsoLevelSerializable to IsolationUsesXactSnapshot().

This commit renames it and resolves that GPDB_91_MERGE_FIXME.

	commit 5eb15c9
	Author: Joe Conway <mail@joeconway.com>
	Date:   Sat Sep 11 18:38:58 2010 +0000

	    SERIALIZABLE transactions are actually implemented beneath the covers with
	    transaction snapshots, i.e. a snapshot registered at the beginning of
	    a transaction. Change variable naming and comments to reflect this reality
	    in preparation for a future, truly serializable mode, e.g.
	    Serializable Snapshot Isolation (SSI).

	    For the moment transaction snapshots are still used to implement
	    SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
	    Grittner and Dan Ports with review and some minor wording changes by me.
Historically in GPDB6, when executing ALTER TABLE SET TABLESPACE
on a partition root, we will recurse into its child tables. This
behavior differs from the upstream where it is not recursed. We
have decided to keep this behavior for 7X, but to also support
the ONLY keyword so that users can choose to alter just the
partition root. Similar to other ALTER TABLE ONLY statements,
the syntax is e.g.:

ALTER TABLE ONLY foopart SET TABLESPACE myts

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
dreamedcheng and others added 2 commits December 30, 2024 10:48
Gpload regression case apache#50 would fail,
if OS user that installed gpdb cluster is not gpadmin.
Since it specified a hardcode user gpadmin,
we need to use PGUSER instead to fix it.

Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>
Previous commit ebb691a03 is to fix ANY sublink contains distinct-on
syntax on partial exprs in the target list of subselect. That fix
method simply not to drop distinct or order-by clause and might lead
to some sublink cannot be pulled up and bad performance, below is an
example of such regression:

  select a from t where a in (select b from t1 group by b)

For the above SQL, if we do not drop the group-by clause, then later
planner will take the subselect as not simple and fail to pull it up
to a join.

This commit fixes the regression by:

1. revert the logic of ebb691a03
2. fix the issue #12656 using a correct way: if
   the distinct on clause's length is not the
   same as the length of subselect's targetlist,
   we do not drop the distinct clause.

Authored-by: Jian Guo <gjian@vmware.com>

Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
@yjhjstz
Copy link
Member Author

yjhjstz commented Dec 31, 2024

move to #827

@yjhjstz yjhjstz closed this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.