Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-Pick Escape database name for dbconn etc #809

Merged
merged 24 commits into from
Dec 25, 2024
Merged

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Dec 24, 2024

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 24, 2024
my-ship-it
my-ship-it previously approved these changes Dec 24, 2024
yinil-hello and others added 23 commits December 24, 2024 17:42
…as extern functions. (#13908)

GPCC's metrics collector needs these two functions to find the priority of the queue of a query.
When an LDAP user attempts to login to the GPDB database with bad credentials, the below message is "leaked" to the log.  This includes the LDAP server address, bind user (distinguished name), and bind password. 
Something like this:

> 2021-11-23 19:43:46.528056 UTC,"ajones2","ajones2",p12654,th-1991804800,"127.0.0.1","53756",2021-11-23 19:43:42 UTC,0,con17,,seg-1,,,,sx1,"LOG","00000","LDAP login failed for user ""uid=ajones2,ou=people,dc=dc1,dc=nebula,dc=local"" on server ""192.168.1.82"": Invalid credentials",,,,,,,0,,"auth.c",2384,
> 
> 2021-11-23 19:43:46.528139 UTC,"ajones2","ajones2",p12654,th-1991804800,"127.0.0.1","53756",2021-11-23 19:43:42 UTC,0,con17,,seg-1,,,,sx1,"FATAL","28000","LDAP authentication failed for user ""ajones2""","Connection matched pg_hba.conf line 92: ""host     all     ajones2 0.0.0.0/0       ldap ldapserver=192.168.1.82 ldapbasedn=""ou=people,dc=dc1,dc=nebula,dc=local"" ldapbinddn=""cn=admin,dc=dc1,dc=nebula,dc=local"" ldapbindpasswd=""SuperSecretPassword"" ldapsearchattribute=""uid"" """,,,,,,0,,"auth.c",318, 

The reason is when we connect database by LDAP, if authentication failed LDAP server will return some user personal privacy like passwd, ID address etc.., so we need to hide these privacy detail to database user and avoid them existing in pg_log file.

In this case, we don't need to add regression test here. Firstly, it is security issue only happed in LDAP, and will not cause other problems in database kernel. Secondly, Adding some regressions cases to test leaking infoomation details is also a hard work.   

Co-authored-by: CharlieTT <chaotian@vmware.com>
The generated columns are computed after distribution.
if generated columns are used as distribution key, they
will always use null values to compute the distribution
key value, and it will cause wrong query results.
gp_fastsequnce entry semantic is to only move forward. Hence, add
check to protect and catch if this assumption is broken. There existed
bug which is fixed now, during AO table truncate where this assumption
was broken and it was very hard to trace the RCA in absence of this
check.
The result of gplogfilter is ambiguously perceived by
parsers. To fix this, the standard csv.writer class is
used to generate csv.

Reviewed-by: Jamie McAtamney <jmcatamney@vmware.com>
Reviewed-by: SmartKeyerror <SmartKeyerror@gmail.com>
In function `GetAllFileSegInfo_pg_aoseg_rel`, when handling each
tuple, we `palloc0` a FileSegInfo, `oneseginfo` is just a pointer
to the zero-allocated memory, using `+=` with the left operand 0
is the same effect as `=`, and `+=` should burn more cpu cycles
than `=`.

I'm not sure the compiler will optimize this kind of `+=` to `=`,
even if it does, using `=` is more accurate since here it is not
a accumulate semantic.

Signed-off-by: Junwang Zhao <zhjwpku@gmail.com>
The first FIXME is introduced by the commit from
merge branch 002f61d8. Remove it and run the full
pipelines it passes the tests.

The second FIXME is asking if it is necassary to
test GUC gp_default_storage_options should keep
consistent among master and segments. Although
gpconfig's unittest might cover this, it does no
harm to also test here for this important GUC.
This is really a bug of the Python Package PyGreSQL,
see issue: PyGreSQL/PyGreSQL#77

As a workaround, let's modify GPDB's code to manually
escape before passing args to pgdb.
Change sprintf function to snprintf, to check resulting
buffer size overflow.
We won't need to execute it on QE.
QD/utility can benefit from this optimization.
We have these checks as part of ATPrepCmd for individual AT commands,
These checks allow AT on external partitions iff ATT_FOREIGN_TABLE
is permitted
For AT_SetDistributedBy, we only allow ATT_TABLE, so external partitions
would always error out based on this check

Added test for this behavior
Change table name in alter_table_aocs2 to avoid collision with
other test alter_table_aocs using same table name
We weren't doing so before and we recently got a complaint about a
double free situation with DatumStreamBlockWrite->datum_buffer. Snuff
out that possibility and similar possibilities.

Co-authored-by: Lei (Alexandra) Wang <alexandra.wanglei@gmail.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
When we bring a path to OuterQuery locus, we need to keep its param info,
because this path may further join with other paths. For instance,

	select * from a where a.i in
		(select count(b.j) from b, c,
			lateral (select * from d where d.j = c.j limit 10) s
		where s.i = a.i
		);

The path for subquery 's' requires parameter from 'c'. When we bring this path
to OuterQuery locus, its param info needs to be preserved, so that when joining
's' with 'b' we can have correct param info.
The below commit added a case to test the GUC gp_workfile_limit_files_per_query,
but there is already one and the newly added one takes way too much time.

	commit 209694154bdc5797ea66b2116dcd82fc9454e593
	Author: zwenlin <zwenlin@vmware.com>
	Date:   Fri Mar 25 19:14:13 2022 +0800

	    Remove gpdb_12_merge_fixme in buffile.c.

	    PostgreSQL breaks temporary files into 1 GB segments. Greenplum didn't
	    do that until v12 merge greenplum-db/gpdb@19cd1cf breaks BufFiles into
	    segments and counts each segment file as one work file.

	    The GUC gp_workfile_limit_files_per_query is used to control the maximum
	    number of spill files for a given query, to prevent runaway queries from
	    destroying the entire system. Counting each segment file is reasonable
	    for this scenario.

	    This PR removes the FIXME of worrying about the count method and adds a test.

This commit removes the newly added case.
This case does not need to be under isolation2 also modify the ansfile
in this commit.
@my-ship-it my-ship-it requested a review from avamingli December 25, 2024 09:49
@yjhjstz yjhjstz merged commit b50e6d1 into apache:main Dec 25, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.