Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): add shrink cast for condition.split_to_scan_ranges() #7962

Merged
merged 9 commits into from
Feb 28, 2023

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Feb 16, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

related issue: #7916

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 16, 2023

Do u think this implementation works? @xiangjinwu @chenzl25 PTAL

If it works, I can complete the rest todo in this PR or later PR.

TODO:

  • support shrink cast in as_in_const_list
  • support more type shrink cast

@ZENOTME ZENOTME marked this pull request as ready for review February 16, 2023 07:18
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @xiangjinwu PTAL. I am not quite sure whether it is the best way to deal with the implicit cast and 'shrink' cast.

_ => return Ok(CastResult::Failed),
};
if let Some(scalar) = self.eval_row_const()? {
let value = scalar.as_integral();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, will we be able to handle float, decimal as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

@xiangjinwu xiangjinwu Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... So it is not enough to only check for minimum and maximum. For where id < 2.5 and where id > 2.5, we need to round in different directions: ceil to id < 3 and floor to id > 2. And when it is id = 2.5, this should evaluate to false. And where id <= 2.5 to where id <= 2, where id >= 2.5 to where id >= 3.

There may be more complicated edge cases we are missing. Without optimization, the system always casts to a "larger" type and compare in that type. But in this optimization, we always want to compare in the type of order column, which may not be the "larger" type. Mimicking this CMP_B(A_TO_B(x), A_TO_B(y)) via CMP_A(B_TO_A(x), B_TO_A(y)) requires us to find the proper B_TO_A function given existing A_TO_B, CMP_B and CMP_A for all possible inputs.

More edge cases (PostgreSQL):

test=# select bigint '9223372036854775807' = double precision '9223372036854775296';
 ?column? 
----------
 t
(1 row)
test=# select (double precision '9223372036854775296')::bigint;
ERROR:  bigint out of range
test=# select (double precision '9223372036854775295')::bigint;
        int8         
---------------------
 9223372036854774784
(1 row)

Given it is an optimization, my suggestions is just to do it with best effort, and remain unoptimized for cases hard to handle correctly. int16/int32/int64 may be a good subset.

Also given its best-effort nature, the "cast_shrink" series helper functions should be limited to a mod for this special optimization, rather than on ExprImpl.

/// Shorthand to create cast expr to `target` type in implicit context.
pub fn cast_implicit(self, target: DataType) -> Result<ExprImpl> {
pub fn cast_implicit(self, target: DataType) -> std::result::Result<ExprImpl, ErrorCode> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is it necessary to change the error type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RwError, it will capture the backtrace so the cost of constructing the RwError is high. There is a known issue #6131.

In cast_implicit_or_shrink, We will store Error return form cast_implicit before call cast_shrink, so if cast_implicit return a RwError, the performance is unacceptable.

Comment on lines 784 to 788
_ => Err(ErrorCode::BindError(format!(
"Cannot cast type \"{}\" to \"{}\".",
const_expr.return_type(),
target
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is now thrown all the way up to the user:

dev=> create table t (v1 int primary key);
CREATE_TABLE
dev=> select * from t where v1 < 1.5;
ERROR:  QueryError: Bind error: Cannot cast type "numeric" to "integer".

But it should just be a signal to skip the scan range optimization. Talking about its caller analyze_eq_const_expr behavior, it should do one of the 3 things:

  • return Ok(true) so that the upper layer return Ok(false_cond())
  • assign eq_conds and return Ok(false)
  • signal the upper layer it cannot match a known optimization and it should just other_conds.push(expr);

We are lacking the last possibility right now.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 24, 2023

Summarize the todo work:

  • handle float, decimal and more type
  • there are some todo work in split_to_scan_range, like merge upperbound and lowerbound

For these todo work, I add the related plan test named Cannot do sth and I will fix them in later PR.

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #7962 (588ddac) into main (9db3773) will increase coverage by 0.00%.
The diff coverage is 87.09%.

@@           Coverage Diff           @@
##             main    #7962   +/-   ##
=======================================
  Coverage   71.57%   71.58%           
=======================================
  Files        1132     1132           
  Lines      182874   182962   +88     
=======================================
+ Hits       130897   130976   +79     
- Misses      51977    51986    +9     
Flag Coverage Δ
rust 71.58% <87.09%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/types/mod.rs 73.31% <50.00%> (-0.25%) ⬇️
src/frontend/src/utils/condition.rs 95.19% <87.50%> (-1.21%) ⬇️
src/utils/pgwire/src/pg_extended.rs 90.90% <100.00%> (+0.03%) ⬆️
src/storage/src/hummock/sstable/xor_filter.rs 97.53% <0.00%> (-1.24%) ⬇️
src/batch/src/task/task_execution.rs 51.62% <0.00%> (-0.51%) ⬇️
src/common/src/types/ordered_float.rs 30.87% <0.00%> (-0.20%) ⬇️
src/storage/src/hummock/compactor/mod.rs 80.38% <0.00%> (-0.20%) ⬇️
src/batch/src/executor/group_top_n.rs 74.70% <0.00%> (+6.47%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ZENOTME ZENOTME requested a review from xiangjinwu February 24, 2023 07:21
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation looks correct, but I would like to try harder and see if the code structure, function names and doc comments can be easier to understand 🤯

src/frontend/planner_test/tests/testdata/range_scan.yaml Outdated Show resolved Hide resolved
@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 27, 2023

The interface and process before is indeed complex.😭 I have try to make it clear. Please let me know if there are still change needed.

Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the refactor!

src/frontend/src/utils/condition.rs Outdated Show resolved Hide resolved
src/frontend/src/utils/condition.rs Outdated Show resolved Hide resolved
};

let Some(new_cond) = new_expr.eval_row_const()? else {
// column = NULL, PK column never be NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI even NULL = NULL is not true. The result is NULL. Anything = NULL is always NULL.

src/frontend/src/utils/condition.rs Outdated Show resolved Hide resolved
src/frontend/src/utils/condition.rs Outdated Show resolved Hide resolved
src/frontend/src/utils/condition.rs Outdated Show resolved Hide resolved
@xiangjinwu xiangjinwu enabled auto-merge February 28, 2023 08:50
@xiangjinwu xiangjinwu added this pull request to the merge queue Feb 28, 2023
@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2023

Hey @ZENOTME, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by clicking "Update branch" or pushing an empty commit with git commit --allow-empty -m "rerun" && git push.

Merged via the queue into main with commit c0b48bf Feb 28, 2023
@xiangjinwu xiangjinwu deleted the zj/shrink_cast branch February 28, 2023 09:34
Li0k pushed a commit that referenced this pull request Feb 28, 2023
…7962)

* specify integral type in pgwire-extended

* support mismatch process

* support more plan test

* refine implementation

* refine plan test

* refine implementation

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@ZENOTME ZENOTME linked an issue Mar 3, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: prepared statement ignores explicit types for actual query results
4 participants