Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement short circuit function evaluation #23367

Merged
merged 66 commits into from
Aug 17, 2021

Conversation

Avogar
Copy link
Member

@Avogar Avogar commented Apr 20, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implementation of short circuit function evaluation, closes #12587.
Add settings short_circuit_function_evaluation to configure short circuit function evaluation.

Detailed description / Documentation draft:

Short circuit function evaluation

Now for functions if, multiIf, and, or values of their arguments (except the first one) are executed only if they are needed in the result.

1) if(cond, expr_1, expr_2)

expr_1 will be evaluated only on rows where cond is true, expr_2 - where condition is false.

Example:

SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10)

In this example we won't get an exception about division by zero because intDiv(42, number) will be evaluated only for numbers that doesn't satisfy condition number = 0.

2) multiIf(cond_1, expr_1, cond_2, expr_2, ..., cond_{n-1}, expr_{n-1}, )

expr_i will be evaluated only on rows where (not cond_1) and (NOT cond_2) and ... and (not cond_{i-1}) and cond_i is true.
cond_i will be evaluated only on rows where (not cond_1) and (not cond_2) and ... and (not cond_{i-1}) is true.

Example:

SELECT multiIf(number == 0, 0, number == 1, intDiv(1, number), number == 2, intDiv(1, number - 1), number == 3, intDiv(1, number - 2), intDiv(1, number - 3)) FROM numbers(10);

In this example we won't get an exception about division by zero.

3) expr_1 and expr_2 and ... and expr_n

expr_i will be evaluated only on rows where expr_1 and expr_2 and ... and expr_{i-1} is true.

Example:

SELECT number != 0 and intDiv(1, number) == 0 and number != 2 and intDiv(1, number - 2) == 0 FROM numbers(10);

In this example we won't get an exception about division by zero.

4) expr_1 or expr_2 or ... or exp_n

expr_i will be evaluated only on rows where (not expr_1) and (not expr_2) and ... and (not expr_{i-1}) is true.

Example:

SELECT number == 0 or intDiv(1, number) != 0 or number == 2 or intDiv(1, number - 2) != 0 FROM numbers(10);

In this example we won't get an exception about division by zero.

User setting

There is a new setting for users: short_circuit_function_evaluation. Possible values:

  • enable Enables short-circuit function evaluation for functions that are suitable for it (can throw or computationally heavy).
  • force_enable Enables short-circuit function evaluation for all functions.
  • disable Disables short-circuit function evaluation.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Apr 20, 2021
@Avogar Avogar changed the title [Draft] Implement short circuit function evaluation, first attempt [Draft] Implement short circuit function evaluation Apr 20, 2021
@Avogar Avogar marked this pull request as draft April 20, 2021 16:03
@Avogar Avogar force-pushed the short-circuit branch 3 times, most recently from 809858c to 4c63a51 Compare April 27, 2021 22:36
@Avogar Avogar force-pushed the short-circuit branch 3 times, most recently from aaa9877 to 1549d3e Compare May 18, 2021 13:08
@Avogar Avogar marked this pull request as ready for review May 18, 2021 13:09
@Avogar Avogar changed the title [Draft] Implement short circuit function evaluation Implement short circuit function evaluation May 18, 2021
@KochetovNicolai KochetovNicolai self-assigned this May 18, 2021
case ActionsDAG::ActionType::ALIAS:
rewriteShortCircuitArguments(child->children, need_outside, force_rewrite);
break;
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose arrayJoin also should be supported.
E.g.: select if(number != 0, arrayJoin([1, intDiv(42, number), 2]), 0)
Here, arrayJoin for short-circuit column will work as if array is empty for mask = 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose arrayJoin also should be supported

I spent a lot of time thinking about how to support it for arrayJoin, but I considered that it's almost impossible. I wrote a comment about it:
https://github.com/ClickHouse/ClickHouse/blob/591cecc66499e99948001a82b0f03e1ced4c7763/src/Interpreters/ExpressionActions.cpp#L81-L87

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it is ok to get offsets after filtering.
select if (number % 2 = 0, [], arrayJoin(range(number))) from numbers(4);
Ordinary evaluation
[0, 1, 2, 3] -> range -> [[], [0], [0, 1], [0, 1, 2]] -> arrayJoin -> [0, 0, 1, 0, 1, 2] -> if -> [0, 0, 1, 2]
[1, 0, 1, 0] -> arrayJoin -> [0, 1, 1, 0, 0, 0]

short-circuit evaluation
if -> [?, 1, ?, 3] -> range -> [?, [0], ?, [0, 1, 2]] -> arrayJoin -> [0, 0, 1, 2]
if -> [1, 0, 1, 0]

But we can support it later (if needed)

@Avogar
Copy link
Member Author

Avogar commented Jun 7, 2021

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Jun 7, 2021

Command update: success

Branch has been successfully updated

@KochetovNicolai
Copy link
Member

:) select if(number > 0, intDiv(42, number), 0) from numbers(4)

SELECT if(number > 0, intDiv(42, number), 0)
FROM numbers(4)

Query id: 9d6e83c0-b3e9-4a71-b03a-d03307aafe8d

┌─if(greater(number, 0), intDiv(42, number), 0)─┐
│                                             0 │
│                                            42 │
│                                            21 │
│                                            14 │
└───────────────────────────────────────────────┘

4 rows in set. Elapsed: 0.003 sec. 

 :) select if(number = 0, 0, intDiv(42, number)) from numbers(4)

SELECT if(number = 0, 0, intDiv(42, number))
FROM numbers(4)

Query id: 27a0f2c2-a395-4346-a332-2a41b666072d

┌─if(equals(number, 0), 0, intDiv(42, number))─┐
│                                            0 │
│                                           42 │
│                                           21 │
│                                           14 │
└──────────────────────────────────────────────┘

4 rows in set. Elapsed: 0.002 sec. 


 :) select if(number > 0, intDiv(42, number), 0), if(number = 0, 0, intDiv(42, number)) from numbers(4)

SELECT
    if(number > 0, intDiv(42, number), 0),
    if(number = 0, 0, intDiv(42, number))
FROM numbers(4)

Query id: 2219d4ed-5dcc-4906-9b37-0b0c6093802f


0 rows in set. Elapsed: 0.004 sec. 

Received exception from server (version 21.7.1):
Code: 153. DB::Exception: Received from localhost:9000. DB::Exception: Division by zero: while executing 'FUNCTION intDiv(42 :: 2, number :: 0) -> intDiv(42, number) UInt8 : 5'. 

@KochetovNicolai
Copy link
Member

:) select if(if(number > 0, intDiv(42, number), 7), intDiv(42, number), 8) from numbers(4) format TSV

SELECT if(if(number > 0, intDiv(42, number), 7), intDiv(42, number), 8)
FROM numbers(4)
FORMAT TSV

Query id: d55177ae-ccb0-4ba9-9194-66781b257493


0 rows in set. Elapsed: 0.004 sec. 

Received exception from server (version 21.7.1):
Code: 153. DB::Exception: Received from localhost:9000. DB::Exception: Division by zero: while executing 'FUNCTION intDiv(42 :: 2, number :: 0) -> intDiv(42, number) UInt8 : 1'. 

:) select if(number > 0, intDiv(42, number), 7) from numbers(4) format TSV

SELECT if(number > 0, intDiv(42, number), 7)
FROM numbers(4)
FORMAT TSV

Query id: 0d7e3fb5-7f18-4297-a1e1-01e66dfbdc3a

7
42
21
14

4 rows in set. Elapsed: 0.003 sec. 

:) select if(number > 0, intDiv(42, number), 8) from numbers(4) format TSV

SELECT if(number > 0, intDiv(42, number), 8)
FROM numbers(4)
FORMAT TSV

Query id: bae9fdd8-97b5-4d9a-b31b-429b0936f04d

8
42
21
14

4 rows in set. Elapsed: 0.003 sec. 

@KochetovNicolai
Copy link
Member

I checked everything except MaskOperations.cpp and IColumn::expand implementation for different columns (btw, do we still need it?). What I have checked looks almost perfect. I think it is 99% ready to be merged.

I see that some binary .arrow and .parquet files are added here. Was it by mistake? Looks like we need to remove them.

if constexpr (column_is_short)
{
if (data_index >= data.size())
throw Exception("Amount of ones in the mask doesn't equal short column size", ErrorCodes::LOGICAL_ERROR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe exception from the main loop may prevent vectorisation ...
But I am ok with such implementation for now

Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to merge it

@KochetovNicolai
Copy link
Member

But also let's wait for perftest.

@Avogar
Copy link
Member Author

Avogar commented Aug 13, 2021

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2021

Command update: success

Branch has been successfully updated

@sevirov
Copy link
Contributor

sevirov commented Aug 19, 2021

Internal documentation ticket: DOCSUP-13429

@azat
Copy link
Collaborator

azat commented Sep 11, 2021

I found it interesting that even with short circuit evaluation disabled, there is still huge overhead:

SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1, short_circuit_function_evaluation='disable'
-- 11.8sec

SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1
-- 11.88sec

SELECT count() FROM zeros(1000000000) WHERE NOT ignore(if(rand() % 2, toDateTime(rand()), toDate(rand()))) settings max_threads=1
-- 8.2sec on 21.7 w/o https://github.com/ClickHouse/ClickHouse/pull/23367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Short circuit function evaluation
5 participants