Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong results of Left Anti Semi (Not-In) Join #130

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

avamingli
Copy link
Contributor

This pr has 2 commits, please review by commits.

See more details and discussion in https://github.com/greenplum-db/gpdb/pull/15663, https://github.com/greenplum-db/gpdb/issues/15662.

Most codes are same with some refinement & conflicts resolved for CBDB.

Fix wrong results of Left Anti Semi (Not-In) Join

CBDB will try to convert a NOT IN sql into a Left Anti Semi (Not-In) Join
by using cdb_find_nonnullable_vars_walker() to find if there might be nullable
values from inner or outer side.

If there is NullTest, expression_tree_walker will iterate the tree using
(NullTest*)Node->arg.
Example:

select c1 from t1_lasj where c1 not in (
	select c1n from t2_lasj_has_null where c1n is null or c1n is null)
	and c1 is not null;

Expression c1n is null would be like:
NullTest [nulltesttype=IS_NULL argisrow=false location=102]
[arg] Var [varno=1 varattno=1 vartype=23 varnoold=1 varoattno=1]

Recursive cdb_find_nonnullable_vars_walker will first check the node->arg Var and
insert into nonNullableVars.
That's incorrect for NullTest type IS_NULL.
We should consifer the NullTest under OR expression and recursive OR expression.
Add a field nullableVars to indentify the vars might be nullable, and must be
eliminated from nonNullableVars finally.
This is more strict, but ensure right results at least.

Correct comments in convert_IN_to_antijoin()

Incorrect example comments:
The transformation is to rewrite a query of the form:

	select c1 from t1 where c1 NOT IN (select c2 from t2);
			(to)
	select c1 from t1 left anti semi join (select 0 as zero, c2 from t2) foo
		ON (c1 = c2) IS NOT FALSE where zero is NULL;

Correct it to:
The transformation is to rewrite a query of the form:

	select c1 from t1 where c1 NOT IN (select c2 from t2);
			(to)
	select c1 from t1 left anti semi join (select 0 as zero, c2 from t2) foo
		ON (c1 != c2) IS NOT FALSE where zero is NULL;

SQL NOT IN should be converted to Left Anti Semi (not-in) Join
with join condition c1 != c2.

Any non-null values from t1 don't match values from t2 shoule be kept and
IS NOT FALSE will return TRUE.

GDB checks after function make_lasj_quals()
join_expr->quals:

BoolExpr [boolop=NOT_EXPR]
        OpExpr [opno=518 opfuncid=144 opresulttype=16 opretset=false]
                Var [varno=1 varattno=1 vartype=23 varnoold=1 varoattno=1]
                Var [varno=2 varattno=1 vartype=23 varnoold=2 varoattno=1]

And oid 518 is a '<>' operator in pg_operator.

select oprname from pg_operator where oid = 518;
-[ RECORD 1 ]
oprname | <>

closes: #50


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready🥳

CBDB will try to convert a NOT IN sql into a Left Anti Semi (Not-In) Join
by using cdb_find_nonnullable_vars_walker() to find if there might be nullable
values from inner or outer side.

If there is  NullTest, expression_tree_walker will iterate the tree using
(NullTest*)Node->arg.
Example:
select c1 from t1_lasj where c1 not in (
	select c1n from t2_lasj_has_null where c1n is null or c1n is null)
	and c1 is not null;
Expression c1n is null would be like:
NullTest [nulltesttype=IS_NULL argisrow=false location=102]
	[arg] Var [varno=1 varattno=1 vartype=23 varnoold=1 varoattno=1]

Recursive cdb_find_nonnullable_vars_walker will first check the node->arg Var and
insert into nonNullableVars.
That's incorrect for NullTest type IS_NULL.
We should consifer the NullTest under OR expression and recursive OR expression.
Add a field nullableVars to indentify the vars might be nullable, and must be
eliminated from nonNullableVars finally.
This is more strict, but ensure right results at least.

Authored-by: Zhang Mingli avamingli@gmail.com
Incorrect example comments:
The transformation is to rewrite a query of the form:
	select c1 from t1 where c1 NOT IN (select c2 from t2);
			(to)
	select c1 from t1 left anti semi join (select 0 as zero, c2 from t2) foo
		ON (c1 = c2) IS NOT FALSE where zero is NULL;
Correct it to:
The transformation is to rewrite a query of the form:
	select c1 from t1 where c1 NOT IN (select c2 from t2);
			(to)
	select c1 from t1 left anti semi join (select 0 as zero, c2 from t2) foo
		ON (c1 != c2) IS NOT FALSE where zero is NULL;

SQL NOT IN should be converted to Left Anti Semi (not-in) Join
with join condition c1 != c2.

Any non-null values from t1 don't match values from t2 shoule be kept and
IS NOT FALSE will return TRUE.

GDB checks after function make_lasj_quals()
join_expr->quals:
BoolExpr [boolop=NOT_EXPR]
        OpExpr [opno=518 opfuncid=144 opresulttype=16 opretset=false]
                Var [varno=1 varattno=1 vartype=23 varnoold=1 varoattno=1]
                Var [varno=2 varattno=1 vartype=23 varnoold=2 varoattno=1]

And oid 518 is a '<>' operator in pg_operator.
select oprname from pg_operator where oid = 518;
-[ RECORD 1 ]
oprname | <>

Authored-by: Zhang Mingli avamingli@gmail.com
@avamingli avamingli force-pushed the fix_wrong_results_not_in branch from 6f32740 to df3ff5f Compare October 8, 2023 08:57
Copy link
Contributor

@kongfanshen-0801 kongfanshen-0801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@avamingli avamingli merged commit 22cbc25 into apache:main Oct 9, 2023
5 checks passed
@avamingli avamingli deleted the fix_wrong_results_not_in branch October 9, 2023 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong results of Left Anti Semi (Not-In) Join in nestloop and hashjoin
3 participants