-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pick Bring back cdbsubselect_drop_distinct. etc #821
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit includes a couple improvements when dumping minirepros and gpsd: 1. Dump the number of segments. This is needed to replicate the plan, and often we have to ask users/customers this separately. Instead just retrieve it during the minirepro 2. Output a `set optimizer=off` for when the minirepro is being restored. The short insert statements are much faster (~10X) using planner than Orca, especially if done in debug buiild. Output this by default so I don't have to modify this manually.
The QE notice message might be truncated because MPPnoticeReceiver() limited message length about 800 bytes sent to the client. The client should not see an incomplete notice message. This commit changes its behaviour, it allows QE messages of any length (include QE identifier) to send to the client.
…. (#14108) In GPDB, expression subquery is promoted to join, e.g. select * from T where a > (select 10*avg(x) from R where T.b=R.y); in postgres, the plan is: QUERY PLAN ----------------------------------------------------------------- Seq Scan on t (cost=0.00..80364.30 rows=753 width=8) Filter: ((a)::numeric > (SubPlan 1)) SubPlan 1 -> Aggregate (cost=35.53..35.54 rows=1 width=32) -> Seq Scan on r (cost=0.00..35.50 rows=10 width=4) Filter: (t.b = y) while in GPDB, the subquery is promoted to join: QUERY PLAN --------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Hash Join (cost=477.00..969.11 rows=9567 width=8) Hash Cond: (t.b = "Expr_SUBQUERY".csq_c0) Join Filter: ((t.a)::numeric > "Expr_SUBQUERY".csq_c1) -> Seq Scan on t (cost=0.00..321.00 rows=28700 width=8) -> Hash (cost=472.83..472.83 rows=333 width=36) -> Subquery Scan on "Expr_SUBQUERY" -> HashAggregate Group Key: r.y -> Seq Scan on r Optimizer: Postgres query optimizer But if the T.b and R.y are different type, it will report error like: ERROR: operator 37 is not a valid ordering operator (pathkeys.c:579) The root cause is that IsCorrelatedEqualityOpExpr may generate incorrect sort operator and equal operator for SortGroupClause when the types of left and right operands of OpExpr are different. Since get_ordering_op_properties() requires the type of the left and right operands being consistent and SortGroupClause->tleSortGroupRef is referencing the target entry created from innerExpr, we teach IsCorrelatedEqualityOpExpr() construct appropriate eqOp and sortOp according to the type of innerExpr.
Previously, when using minirepro on a query that accessed materialized views, the object that the materialized view used/accessed were not dumped with the minirepro. This is because queries themselves treat materialized views as tables, not views. However, when enumerating the dependent objects for dumping the relevant DDL, we need to treat materialized views as regular views. This commit sets a flag when dumping from gp_dump_query_oids so that materialized views are treated as regular views when parsing the query.
This commit adds calls to `RelationOpenSmgr()` in `open_all_datumstreamread_segfiles()` and `openFetchSegmentFile()` to ensure that the storage manager (Smgr) relation is opened before accessing Append-Optimized (AO) segment files. Without explicitly opening the Smgr relation, accessing `rel->rd_smgr->smgr_ao` could lead to undefined behavior or errors when the Smgr is not initialized. This change ensures stability and correctness when working with AO segment files.
For GPDB5 pg_tablespace.spclocation is used For GPDB6+ pg_catalog.pg_tablespace_location(oid) is used Authored-by: Brent Doil <bdoil@vmware.com>
We don't drop partitioned tables with indexes for the pg_upgrade regression test on GPDB7. Remove the commented out commands and FIXME. Authored-by: Brent Doil <bdoil@vmware.com>
The same value is assigned to the variable at the moment of declaration (some lines before)
Co-authored-by: CharlieTT <chaotian@vmware.com> Fix crash of PG_TRY which caused by return branch in PG_TRY block. When we return value in PG_TRY, some setjump/longjump context has been messed up so that we cannot longjump to right address once ereport(ERROR) exists later, so crash will happen.
On PG12 an option SKIP_LOCKED was added to VACUUM and ANALYZE to indicate whether we need to skip tables when a lock cannot be acquired at the time. However, relevant tests don't cover all the scenarios on GPDB. This PR is to add more tests to cover the scenarios.
In GP's convert_ANY_sublink_to_join() now, there is no simplification of origin subquery, as a result sometimes the sublink cannot be pulled up, which produced a bad query plan. In GP5, there is a subquery simplify step which helped sublink pull-up, but we lose this simplification since version 6X until master, which caused the sublink pull-up doesn't work as only simple subquery is allowed to be pulled up. By this commit we brought back cdbsubselect_drop_distinct and cdbsubselect_drop_orderby, then we can got simplified subquery again and do the sublink pull-up optimization on it which produced a better plan.
The commit faa7dc0713 fix the fallback but forget to remove FIXME.
The case is to test some code path during parallel hash join but Greenplum does not support Postgres' parallel plan. Changes the FIXME to a NOTE.
Historically in GPDB6, when executing GRANT/REVOKE on a partition root, we will recurse into its child tables. We have decided to keep this behavior for 7X, but to also support the ONLY keyword so that users can choose to grant/revoke just the partition root, without recursion. The syntax is e.g.: GRANT SELECT ON ONLY foo_par TO role; Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
Now the SQL does not fallback to planner.
The name of UpdateSerializableCommandId is a bit wrong, now that SERIALIZABLE and REPEATABLE READ are not the same. From comparison, the if-check above was changed from checking IsXactIsoLevelSerializable to IsolationUsesXactSnapshot(). This commit renames it and resolves that GPDB_91_MERGE_FIXME. commit 5eb15c9 Author: Joe Conway <mail@joeconway.com> Date: Sat Sep 11 18:38:58 2010 +0000 SERIALIZABLE transactions are actually implemented beneath the covers with transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.
Historically in GPDB6, when executing ALTER TABLE SET TABLESPACE on a partition root, we will recurse into its child tables. This behavior differs from the upstream where it is not recursed. We have decided to keep this behavior for 7X, but to also support the ONLY keyword so that users can choose to alter just the partition root. Similar to other ALTER TABLE ONLY statements, the syntax is e.g.: ALTER TABLE ONLY foopart SET TABLESPACE myts Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
Gpload regression case apache#50 would fail, if OS user that installed gpdb cluster is not gpadmin. Since it specified a hardcode user gpadmin, we need to use PGUSER instead to fix it. Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>
Previous commit ebb691a03 is to fix ANY sublink contains distinct-on syntax on partial exprs in the target list of subselect. That fix method simply not to drop distinct or order-by clause and might lead to some sublink cannot be pulled up and bad performance, below is an example of such regression: select a from t where a in (select b from t1 group by b) For the above SQL, if we do not drop the group-by clause, then later planner will take the subselect as not simple and fail to pull it up to a join. This commit fixes the regression by: 1. revert the logic of ebb691a03 2. fix the issue #12656 using a correct way: if the distinct on clause's length is not the same as the length of subselect's targetlist, we do not drop the distinct clause. Authored-by: Jian Guo <gjian@vmware.com> Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
move to #827 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions