Cherry-Pick Escape database name for dbconn etc #809

yjhjstz · 2024-12-24T05:25:30Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

…as extern functions. (#13908) GPCC's metrics collector needs these two functions to find the priority of the queue of a query.

…he server encoding (#13885)

When an LDAP user attempts to login to the GPDB database with bad credentials, the below message is "leaked" to the log. This includes the LDAP server address, bind user (distinguished name), and bind password. Something like this: > 2021-11-23 19:43:46.528056 UTC,"ajones2","ajones2",p12654,th-1991804800,"127.0.0.1","53756",2021-11-23 19:43:42 UTC,0,con17,,seg-1,,,,sx1,"LOG","00000","LDAP login failed for user ""uid=ajones2,ou=people,dc=dc1,dc=nebula,dc=local"" on server ""192.168.1.82"": Invalid credentials",,,,,,,0,,"auth.c",2384, > > 2021-11-23 19:43:46.528139 UTC,"ajones2","ajones2",p12654,th-1991804800,"127.0.0.1","53756",2021-11-23 19:43:42 UTC,0,con17,,seg-1,,,,sx1,"FATAL","28000","LDAP authentication failed for user ""ajones2""","Connection matched pg_hba.conf line 92: ""host all ajones2 0.0.0.0/0 ldap ldapserver=192.168.1.82 ldapbasedn=""ou=people,dc=dc1,dc=nebula,dc=local"" ldapbinddn=""cn=admin,dc=dc1,dc=nebula,dc=local"" ldapbindpasswd=""SuperSecretPassword"" ldapsearchattribute=""uid"" """,,,,,,0,,"auth.c",318, The reason is when we connect database by LDAP, if authentication failed LDAP server will return some user personal privacy like passwd, ID address etc.., so we need to hide these privacy detail to database user and avoid them existing in pg_log file. In this case, we don't need to add regression test here. Firstly, it is security issue only happed in LDAP, and will not cause other problems in database kernel. Secondly, Adding some regressions cases to test leaking infoomation details is also a hard work. Co-authored-by: CharlieTT <chaotian@vmware.com>

The generated columns are computed after distribution. if generated columns are used as distribution key, they will always use null values to compute the distribution key value, and it will cause wrong query results.

gp_fastsequnce entry semantic is to only move forward. Hence, add check to protect and catch if this assumption is broken. There existed bug which is fixed now, during AO table truncate where this assumption was broken and it was very hard to trace the RCA in absence of this check.

The result of gplogfilter is ambiguously perceived by parsers. To fix this, the standard csv.writer class is used to generate csv. Reviewed-by: Jamie McAtamney <jmcatamney@vmware.com> Reviewed-by: SmartKeyerror <SmartKeyerror@gmail.com>

In function `GetAllFileSegInfo_pg_aoseg_rel`, when handling each tuple, we `palloc0` a FileSegInfo, `oneseginfo` is just a pointer to the zero-allocated memory, using `+=` with the left operand 0 is the same effect as `=`, and `+=` should burn more cpu cycles than `=`. I'm not sure the compiler will optimize this kind of `+=` to `=`, even if it does, using `=` is more accurate since here it is not a accumulate semantic. Signed-off-by: Junwang Zhao <zhjwpku@gmail.com>

The first FIXME is introduced by the commit from merge branch 002f61d8. Remove it and run the full pipelines it passes the tests. The second FIXME is asking if it is necassary to test GUC gp_default_storage_options should keep consistent among master and segments. Although gpconfig's unittest might cover this, it does no harm to also test here for this important GUC.

This is really a bug of the Python Package PyGreSQL, see issue: PyGreSQL/PyGreSQL#77 As a workaround, let's modify GPDB's code to manually escape before passing args to pgdb.

Change sprintf function to snprintf, to check resulting buffer size overflow.

We won't need to execute it on QE. QD/utility can benefit from this optimization.

We have these checks as part of ATPrepCmd for individual AT commands, These checks allow AT on external partitions iff ATT_FOREIGN_TABLE is permitted For AT_SetDistributedBy, we only allow ATT_TABLE, so external partitions would always error out based on this check Added test for this behavior

Change table name in alter_table_aocs2 to avoid collision with other test alter_table_aocs using same table name

We weren't doing so before and we recently got a complaint about a double free situation with DatumStreamBlockWrite->datum_buffer. Snuff out that possibility and similar possibilities. Co-authored-by: Lei (Alexandra) Wang <alexandra.wanglei@gmail.com> Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>

When we bring a path to OuterQuery locus, we need to keep its param info, because this path may further join with other paths. For instance, select * from a where a.i in (select count(b.j) from b, c, lateral (select * from d where d.j = c.j limit 10) s where s.i = a.i ); The path for subquery 's' requires parameter from 'c'. When we bring this path to OuterQuery locus, its param info needs to be preserved, so that when joining 's' with 'b' we can have correct param info.

The below commit added a case to test the GUC gp_workfile_limit_files_per_query, but there is already one and the newly added one takes way too much time. commit 209694154bdc5797ea66b2116dcd82fc9454e593 Author: zwenlin <zwenlin@vmware.com> Date: Fri Mar 25 19:14:13 2022 +0800 Remove gpdb_12_merge_fixme in buffile.c. PostgreSQL breaks temporary files into 1 GB segments. Greenplum didn't do that until v12 merge greenplum-db/gpdb@19cd1cf breaks BufFiles into segments and counts each segment file as one work file. The GUC gp_workfile_limit_files_per_query is used to control the maximum number of spill files for a given query, to prevent runaway queries from destroying the entire system. Counting each segment file is reasonable for this scenario. This PR removes the FIXME of worrying about the count method and adds a test. This commit removes the newly added case.

This case does not need to be under isolation2 also modify the ansfile in this commit.

yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 24, 2024

yjhjstz force-pushed the pick_1220 branch from f9b4e13 to d257829 Compare December 24, 2024 05:48

my-ship-it previously approved these changes Dec 24, 2024

View reviewed changes

yjhjstz dismissed my-ship-it’s stale review via 096255d December 24, 2024 09:41

yjhjstz force-pushed the pick_1220 branch from d257829 to 096255d Compare December 24, 2024 09:41

yinil-hello and others added 23 commits December 24, 2024 17:42

Declare BackoffPriorityIntToValue and ResourceQueueGetPriorityWeight …

c92ddaa

…as extern functions. (#13908) GPCC's metrics collector needs these two functions to find the priority of the queue of a query.

if encoding is defaulted when creating external tables, we will use t…

5a23193

…he server encoding (#13885)

disallow generated columns in distribution key

2a1b419

The generated columns are computed after distribution. if generated columns are used as distribution key, they will always use null values to compute the distribution key value, and it will cause wrong query results.

cdbappendonlystoragewrite: Remove dead functions

789c623

Avoid loading gp_inject_fault extension twice

10d0097

gpcheckcat: Make opt block similar to 6X

6494afe

Fix gplogfilter csv generation

eff9df6

The result of gplogfilter is ambiguously perceived by parsers. To fix this, the standard csv.writer class is used to generate csv. Reviewed-by: Jamie McAtamney <jmcatamney@vmware.com> Reviewed-by: SmartKeyerror <SmartKeyerror@gmail.com>

Escape database name for dbconn.

d10e996

This is really a bug of the Python Package PyGreSQL, see issue: PyGreSQL/PyGreSQL#77 As a workaround, let's modify GPDB's code to manually escape before passing args to pgdb.

Fix compile-time warn in pg_basebackup code.

79b1940

Change sprintf function to snprintf, to check resulting buffer size overflow.

Removing AOCO add column fixme

e91670b

We won't need to execute it on QE. QD/utility can benefit from this optimization.

Fix for ICW test alter_table_aocs2

a7a874c

Change table name in alter_table_aocs2 to avoid collision with other test alter_table_aocs using same table name

AOFetchBlockMetadata: Remove dead fields

0582b09

DatumStreamRead teardown: NULL out after pfree

c2ed017

Eliminate alien nodes before execution for entry db

327fdc4

Move test bitmap_union from isolation2 to regress.

b1f1a2e

This case does not need to be under isolation2 also modify the ansfile in this commit.

yjhjstz force-pushed the pick_1220 branch from 096255d to 41c7abb Compare December 24, 2024 09:42

Fix pipeline failure

0167f97

yjhjstz force-pushed the pick_1220 branch from 41c7abb to 0167f97 Compare December 24, 2024 10:44

my-ship-it approved these changes Dec 25, 2024

View reviewed changes

my-ship-it requested a review from avamingli December 25, 2024 09:49

avamingli approved these changes Dec 25, 2024

View reviewed changes

yjhjstz merged commit b50e6d1 into apache:main Dec 25, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-Pick Escape database name for dbconn etc #809

Cherry-Pick Escape database name for dbconn etc #809

yjhjstz commented Dec 24, 2024

Cherry-Pick Escape database name for dbconn etc #809

Cherry-Pick Escape database name for dbconn etc #809

Conversation

yjhjstz commented Dec 24, 2024

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions