Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CherryPick Resolve a FIXME in merge_leaf_stats() etc #827

Merged
merged 41 commits into from
Jan 2, 2025

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Dec 31, 2024

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


chrishajas and others added 30 commits December 31, 2024 12:52
This commit includes a couple improvements when dumping minirepros and gpsd:

1. Dump the number of segments. This is needed to replicate the plan,
and often we have to ask users/customers this separately. Instead just
retrieve it during the minirepro
2. Output a `set optimizer=off` for when the minirepro is being
restored. The short insert statements are much faster (~10X) using
planner than Orca, especially if done in debug buiild. Output this by
default so I don't have to modify this manually.
The QE notice message might be truncated because MPPnoticeReceiver()
limited message length about 800 bytes sent to the client. The client
should not see an incomplete notice message.

This commit changes its behaviour, it allows QE messages of any length
(include QE identifier) to send to the client.
…. (#14108)

In GPDB, expression subquery is promoted to join, e.g.
select * from T where a > (select 10*avg(x) from R where T.b=R.y);
in postgres, the plan is:

                          QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on t  (cost=0.00..80364.30 rows=753 width=8)
   Filter: ((a)::numeric > (SubPlan 1))
   SubPlan 1
     ->  Aggregate  (cost=35.53..35.54 rows=1 width=32)
           ->  Seq Scan on r  (cost=0.00..35.50 rows=10 width=4)
                 Filter: (t.b = y)

while in GPDB, the subquery is promoted to join:

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Hash Join  (cost=477.00..969.11 rows=9567 width=8)
         Hash Cond: (t.b = "Expr_SUBQUERY".csq_c0)
         Join Filter: ((t.a)::numeric > "Expr_SUBQUERY".csq_c1)
         ->  Seq Scan on t  (cost=0.00..321.00 rows=28700 width=8)
         ->  Hash  (cost=472.83..472.83 rows=333 width=36)
               ->  Subquery Scan on "Expr_SUBQUERY"
                     ->  HashAggregate
                           Group Key: r.y
                           ->  Seq Scan on r
 Optimizer: Postgres query optimizer

But if the T.b and R.y are different type, it will report error like:
ERROR:  operator 37 is not a valid ordering operator (pathkeys.c:579)

The root cause is that IsCorrelatedEqualityOpExpr may generate incorrect sort
operator and equal operator for SortGroupClause when the types of left and right
operands of OpExpr are different.

Since get_ordering_op_properties() requires the type of the left and right
operands being consistent and SortGroupClause->tleSortGroupRef is referencing
the target entry created from innerExpr, we teach IsCorrelatedEqualityOpExpr()
construct appropriate eqOp and sortOp according to the type of innerExpr.
Previously, when using minirepro on a query that accessed materialized
views, the object that the materialized view used/accessed were not
dumped with the minirepro. This is because queries themselves treat
materialized views as tables, not views. However, when enumerating the
dependent objects for dumping the relevant DDL, we need to treat
materialized views as regular views.

This commit sets a flag when dumping from gp_dump_query_oids so that
materialized views are treated as regular views when parsing the query.
This commit adds calls to `RelationOpenSmgr()` in `open_all_datumstreamread_segfiles()` and `openFetchSegmentFile()` to ensure that the storage manager (Smgr) relation is opened before accessing Append-Optimized (AO) segment files.

Without explicitly opening the Smgr relation, accessing `rel->rd_smgr->smgr_ao` could lead to undefined behavior or errors when the Smgr is not initialized. This change ensures stability and correctness when working with AO segment files.
For GPDB5 pg_tablespace.spclocation is used
For GPDB6+ pg_catalog.pg_tablespace_location(oid) is used

Authored-by: Brent Doil <bdoil@vmware.com>
We don't drop partitioned tables with indexes for the pg_upgrade regression test on GPDB7.
Remove the commented out commands and FIXME.

Authored-by: Brent Doil <bdoil@vmware.com>
The same value is assigned to the variable at the moment
of declaration (some lines before)
Co-authored-by: CharlieTT <chaotian@vmware.com>

Fix crash of PG_TRY which caused by return branch in PG_TRY block. When we return value in PG_TRY, some
setjump/longjump context has been messed up so that we cannot longjump to right address once ereport(ERROR)
exists later, so crash will happen.
On PG12 an option SKIP_LOCKED was added to VACUUM and ANALYZE to indicate
whether we need to skip tables when a lock cannot be acquired at the time.
However, relevant tests don't cover all the scenarios on GPDB. This PR is
to add more tests to cover the scenarios.
In GP's convert_ANY_sublink_to_join() now, there is no simplification of
origin subquery, as a result sometimes the sublink cannot be pulled up,
which produced a bad query plan. In GP5, there is a subquery simplify
step which helped sublink pull-up, but we lose this simplification since
version 6X until master, which caused the sublink pull-up doesn't work as
only simple subquery is allowed to be pulled up. By this commit we brought
back cdbsubselect_drop_distinct and cdbsubselect_drop_orderby, then we can
got simplified subquery again and do the sublink pull-up optimization on it
which produced a better plan.
The commit faa7dc0713 fix the fallback but forget to remove
FIXME.
The case is to test some code path during parallel hash join but
Greenplum does not support Postgres' parallel plan. Changes the FIXME
to a NOTE.
Historically in GPDB6, when executing GRANT/REVOKE on a partition
root, we will recurse into its child tables. We have decided to
keep this behavior for 7X, but to also support the ONLY keyword
so that users can choose to grant/revoke just the partition root,
without recursion. The syntax is e.g.:

GRANT SELECT ON ONLY foo_par TO role;

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
Now the SQL does not fallback to planner.
The name of UpdateSerializableCommandId is a bit wrong, now that SERIALIZABLE
and REPEATABLE READ are not the same. From comparison, the if-check above was
changed from checking IsXactIsoLevelSerializable to IsolationUsesXactSnapshot().

This commit renames it and resolves that GPDB_91_MERGE_FIXME.

	commit 5eb15c9
	Author: Joe Conway <mail@joeconway.com>
	Date:   Sat Sep 11 18:38:58 2010 +0000

	    SERIALIZABLE transactions are actually implemented beneath the covers with
	    transaction snapshots, i.e. a snapshot registered at the beginning of
	    a transaction. Change variable naming and comments to reflect this reality
	    in preparation for a future, truly serializable mode, e.g.
	    Serializable Snapshot Isolation (SSI).

	    For the moment transaction snapshots are still used to implement
	    SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
	    Grittner and Dan Ports with review and some minor wording changes by me.
Historically in GPDB6, when executing ALTER TABLE SET TABLESPACE
on a partition root, we will recurse into its child tables. This
behavior differs from the upstream where it is not recursed. We
have decided to keep this behavior for 7X, but to also support
the ONLY keyword so that users can choose to alter just the
partition root. Similar to other ALTER TABLE ONLY statements,
the syntax is e.g.:

ALTER TABLE ONLY foopart SET TABLESPACE myts

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/ImJrvQEECwA
Gpload regression case apache#50 would fail,
if OS user that installed gpdb cluster is not gpadmin.
Since it specified a hardcode user gpadmin,
we need to use PGUSER instead to fix it.

Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>
Previous commit ebb691a03 is to fix ANY sublink contains distinct-on
syntax on partial exprs in the target list of subselect. That fix
method simply not to drop distinct or order-by clause and might lead
to some sublink cannot be pulled up and bad performance, below is an
example of such regression:

  select a from t where a in (select b from t1 group by b)

For the above SQL, if we do not drop the group-by clause, then later
planner will take the subselect as not simple and fail to pull it up
to a join.

This commit fixes the regression by:

1. revert the logic of ebb691a03
2. fix the issue #12656 using a correct way: if
   the distinct on clause's length is not the
   same as the length of subselect's targetlist,
   we do not drop the distinct clause.

Authored-by: Jian Guo <gjian@vmware.com>

Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
Added code to print thread id for passed or failed status to easily
correlate the message result with the data directory. Some minor
formatting along the way.
FIXMEs were related to adding support for checking dynamic access
methods. Given we are far away from supporting all built-in access
methods, dynamic support is not on the radar at all. Plus, having a
new access method is not a common act at all. So, we can cross the
bridge of converting RelationTypeData to a dynamic array instead of
the static array when we get there. Hence, for now, resolving the
FIXME by manually updating the static array to reflect all the current
access methods.

Note: with this change now "ao" gets split into ao_row and
ao_column. This is nice as can run more granular checking if desired.

Also, converted GPDB_94_MERGE_FIXME to GPDB_FEATURE_NOT_SUPPORTED in
Makefile. As we are yet to test/validate/enable this tool for various
index access methods - which we should do separately.
Previously, gp_gettmid() returns PgStartTime as the start time of gpdb  ("tmid") and saves it into instrument shared memory, but the value is wrong after converting to `TimestampTz` (which is int64). Since the value is saved in the shared memory, plus GPCC has been using 32-bit integer for tmid, adapting to int64 will require a lot of change in both GPDB's and GPCC's coed. Therefore, in this commit, gp_gettmid() converts PgStartTime to a correct int32 value instead.

Co-authored-by: Violet Cheng <vcheng@vmware.com>
The effort was duplicated with set_append_path_locus(), which is called
from create_append_path(). Let's just let set_append_path_locus() deal
with it.

Reviewed-by: Haotian Chen <chaotian@vmware.com>
Reviewed-by: Zhenghua Lyu <kainwen@gmail.com>
External tables with dropped columns failed to upgrade with:
```
pg_restore: creating EXTERNAL TABLE ext_upgrade_failure_test
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 237; 1259 18142 EXTERNAL TABLE ext_upgrade_failure_test gpadmin
pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at or near ","
LINE 14:     "........pg.dropped.1........" ,
                                            ^
```

Fix pg_dump to output the ALTER TABLE DROP COLUMN DDL for external
tables with dropped columns.
This commit makes the DML state nomenclature more canonical, cleans up
struct declarations etc.
The backend local hash table mechanism works well enough for storing
DML/DML-like command-bound state. Lift the FIXME that questions its
applicability.

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/g_RG4r_9W3s/m/VJNafJibEQAJ
Grouping them with the other uao_dml tests led to flakes. Move them to a
new group.

Heed the words of wisdom:

"These tests use gp_select_invisible and VACUUM, and will get confused
if there are concurrent transactions holding back the global xmin."
Fixes https://github.com/greenplum-db/gpdb/issues/13819 and also aligns with upstream

Don't need to setup the tablespace when calling `make` or `make
install`, it should be setup only for the right target like `check` or
`check-tests`.
zhjwpku and others added 7 commits December 31, 2024 12:54
The `SET_VAR` function is indentical in gpcreateseg.sh and
gpinitsystem, since they both source the file gp_bash_functions.sh
Moving SET_VAR in gp_bash_functions.sh
Test case "dispatch started failing on PLANNER CI after commit a7ca9e27.
Using github Issue #14428 for tracking. It seems somehow triggered by
the test added in bfv_partition in aforementioned commit.
…. (#14418)

As #14417 reported, there will be assertion failures in some cases when
executing ModifyTable.

This is because we opened all the range table relations in NoLock mode,
but sometimes with no lock holding by us.

We fixed this by using GpPolicyFetch instead of opening relations since we
only need to judge if the range table is a partitioned table. This fix also
resolved a GPDB_91_MERGE FIXME left around range table locking code.
…d unchanged. (#13499)

CLUSTER operation on append-optimized tables does not compute freeze limit
(frozenXid) because AO tables do not have relfrozenxid.  The toast tables
need to keep existing relfrozenxid value unchanged in this case.
    Earlier gpcheckcat used to error out when root partition has different distribution policy than child partition. But commit a45be43 introduces a new statement `ALTER TABLE EXPAND PARTITION PREPARE` which modifies the distribution policy of the leaf partitions to be randomly distributed. Based on this a policy is good if the following conditions are satisfied:

    - numsegments value is equal for all root, middle and leaf level partitions
    - if a root is randomly distributed, then all middle, leaf level partitions must also be randomly distributed
    - if a root is hash distributed, then middle level should be same as root and leaf level partition can be hash on same key or randomly distributed

    This commit modifies the existing SQL query such that whenever a parent is hash distributed, the child can be randomly distributed if and only if it is a leaf level partition.

    Also added a few behave test cases for the same.
For a partitioned table to have a correct distribution policy, the numsegments value for all root, middle and leaf level partitions must be equal. This commit adds this check for gpcheckcat. Readable external partitions are excluded from this check.

Sample log output for numsegments check failure:
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[INFO]:-[FAIL] partition numsegments check
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:-partition numsegments check found 1 issue(s)
[ERROR]: child partition(s) have different numsegments value from the root partition. Check the gpcheckcat log for details.
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:-The following tables have different numsegments value:
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:---------
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:-  table | affected child | parent numsegments value | child numsegments value
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:-  ------+----------------+--------------------------+------------------------
20221103:11:36:30:098659 gpcheckcat:jnihal-a01:jnihal-[ERROR]:-  public.sales_1_prt_10 | public.sales_1_prt_10_2_prt_asia | 3 | 1
The pg_ctl promote wait logic checks for IN_PRODUCTION state in the
promoted coordinator's control file to determine if the standby
coordinator has successfully been promoted or not. Unfortunately, that
IN_PRODUCTION state does not guarantee that the promoted coordinator
is ready to accept database connections yet. Make gpactivatestandby do
a retry loop on its forced CHECKPOINT logic to guarantee that the
CHECKPOINT goes through and immediately reflects the new timeline
id. With this patch, intermittent gpactivatestandby Behave test
flakes seen frequently in the Concourse pipeline are resolved.

The logic in this patch is also actually somewhat of a revert to
previous logic as we used to do the very same retry loop. The retry
loop was removed because we wrongly assumed pg_ctl promote wait logic
would wait until database connections were ready to be accepted. The
try/catch would actually catch the error but it was being ignored
(which was intentional from the previous retry logic but wrong in the
non-retry logic).

GPDB commit references:
https://github.com/greenplum-db/gpdb/commit/1fe901d4bb9b141fd6931977eb93515982ac6edd
https://github.com/greenplum-db/gpdb/commit/5868160a73ba9cfcebdb61820635301bfb8cc0c1
@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 31, 2024
yjhjstz and others added 3 commits December 31, 2024 15:59
    After we reload the config using pg_reload_conf,  ic-proxy progress will call
    ic_proxy_server_on_signal to process, It will compare the new addrs with the
    old addrs, both the old and new lists must be sorted by dbid, the caller is responsible
    to ensure this. If the addr is removed, it will be disconnected, if some addrs are added,
    we will establish the connection, however, old addrs maybe not sorted by dbid in guc
    gp_interconnect_proxy_addresses, although it is sorted, the function uv_getaddrinfo
    is asynchronous and can not guarantee the addrs parsed from gp_interconnect_proxy_addresses
    are added sequentially to ic_proxy_addrs.
    So some addr may be mis-disconnected and lead to select gp_segment_id, pg_reload_conf() from gp_dist_random('gp_id') failed, Error interconnect error: connection closed prematurely

    Before reloading the config file. we should sort ic_proxy_addrs by dbid to avoid
    mis-disconnecting of addrs.
hash_search(,,HASH_REMOVE,) returns a dangling pointer that shouldn't be
dereferenced.
A FIXME was left noting whether to use AccessShareLock or NoLock
when opening child tables to merge their stats. Since we have
acquired ShareUpdateExclusiveLock on the parent table when we
are at merge_leaf_stats(), we don't need extra lock to guard
against concurrent DROP of either the parent or the child (which
requries AccessExclusiveLock on the parent).
Concurrent UPDATE is possible but because we are not updating the table
ourselves, NoLock is sufficient here.
my-ship-it
my-ship-it previously approved these changes Dec 31, 2024
Copy link
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yjhjstz yjhjstz merged commit 4bd2d57 into apache:main Jan 2, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.