-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nullable primary key with correct KeyCondition #12455
Conversation
@zlobober btw, your NULL_LAST logic has gone (I don't know why). I've added it back https://github.com/ClickHouse/ClickHouse/pull/12455/files#diff-043ede2be3d33df96a1f9b9b355678ebL89 |
abc2ba5
to
cbafe08
Compare
After futher thoughts, two special Field types are added (-Inf and +Inf) which explicitly helps special comparisons such as NULL_LAST. Field Null should not be used as comparison argument. |
f081a12
to
345db99
Compare
bfb1212
to
b820ace
Compare
e257ee2
to
0824256
Compare
@amosbird, Sorry for the late reply. I like idea of introducing special fields '-inf' and '+inf' very much, it would make much easier expressing open/half-open intervals and rays in some situations. Do you have any plans on finishing this PR? To make it clear: I am not a ClickHouse maintainer, I am developing an internal Yandex software using ClickHouse as computational engine. Your proposed change could actually help in my case :) |
@zlobober Hi, this pr is already finished. I've covered almost all cases related to Nullable primary keys. It's currently waiting for review. |
FWIW I've applied this PR into my build and I did not find any problems yet |
d1ac83f
to
c9e2f2d
Compare
src/Core/Field.h
Outdated
NegativeInfinity = 1, | ||
PositiveInfinity = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amosbird can you update your PR after recent changes?
2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:462:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34 switch (which)
2020-09-03 05:55:34 ^
2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:497:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34 switch (which)
2020-09-03 05:55:34 ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@alexey-milovidov what is the status of this PR? Are there some concerns about it? |
42414c9
to
fabefff
Compare
@azat I did not review this PR and no one from my team decided to take it. |
Yandex projects require manual fixing from our side due to changed interfaces... |
Weird CI errors... |
@amosbird BTW right now if you will run newer clickhouse and then rollback to the version w/o this patch you will not get any errors, but simply index analysis will work incorrectly (for Nullable), maybe it worth adding some guard against this? |
Hmm, what kind of guards do you suggest? |
I guess new setting for MergeTree will be enough |
|
||
if (!type->isNullable()) | ||
{ | ||
serialization->serializeBinary(hyperrectangle[i].left, ostr); | ||
serialization->serializeBinary(hyperrectangle[i].right, ostr); | ||
} | ||
else | ||
{ | ||
bool is_null = hyperrectangle[i].left.isNull() || hyperrectangle[i].right.isNull(); // one is enough | ||
writeBinary(is_null, ostr); | ||
if (!is_null) | ||
{ | ||
serialization->serializeBinary(hyperrectangle[i].left, ostr); | ||
serialization->serializeBinary(hyperrectangle[i].right, ostr); | ||
} | ||
} | ||
serialization->serializeBinary(hyperrectangle[i].left, ostr); | ||
serialization->serializeBinary(hyperrectangle[i].right, ostr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks on disk format (and the relevant change in the MergeTreeIndexGranuleMinMax::deserializeBinary
)
Too bad that I checked this patchset only for compatibility after reverting this patch - #12455 (comment) (use case: I've applied it manually, then revert it, and data skipping indexes over Nullable column had been broken)
But this patchset actually breaks compatibility with older versions of clickhouse for Nullable data skipping indexes after simple upgrade:
Here is a simple reproducer:
--
-- run this with 21.6 or similar (i.e. w/o this patch)
--
CREATE TABLE data
(
`key` Int,
`value` Nullable(Int),
INDEX value_index value TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY key;
INSERT INTO data SELECT
number,
number
FROM numbers(10000);
SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;
-- now upgrade and run the query again
SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;
And it will fail because of on disk format changes:
$ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx
-rw-r----- 1 azat azat 36 data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
-rw-r----- 1 azat azat 37 data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
$ md5sum data/*/data/all_1_1_0/skp*.idx
a19c95c4a14506c65665a1e30ab404bf data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
e50e2fcfa873b232196623d56ab26105 data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
Nice that there is no stable release with this patch included yet.
But personally I, don't use stable releases and how I wish this patchset created new on disk format...
Note that you may create data skipping indexes over Nullable column even before this patchset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind sending a PR restoring old on disk format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] breaks on disk format (and the relevant change in the: [1]: ClickHouse#12455 (comment) Too bad that I checked this patchset only for compatibility after reverting this patch [2] (use case: I've applied it manually, then revert it, and data skipping indexes over Nullable column had been broken) [2]: ClickHouse#12455 (comment) But this patchset actually breaks compatibility with older versions of clickhouse for Nullable data skipping indexes after simple upgrade: Here is a simple reproducer: -- -- run this with 21.6 or similar (i.e. w/o this patch) -- CREATE TABLE data ( `key` Int, `value` Nullable(Int), INDEX value_index value TYPE minmax GRANULARITY 1 ) ENGINE = MergeTree ORDER BY key; INSERT INTO data SELECT number, number FROM numbers(10000); SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; Now upgrade and run the query again: SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; And it will fail because of on disk format changes: $ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx -rw-r----- 1 azat azat 36 data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx -rw-r----- 1 azat azat 37 data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx $ md5sum data/*/data/all_1_1_0/skp*.idx a19c95c4a14506c65665a1e30ab404bf data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx e50e2fcfa873b232196623d56ab26105 data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx Note, that there is no stable release with this patch included yet, so no need to backport. Also note that you may create data skipping indexes over Nullable column even before [3]. [3]: ClickHouse#12455 v2: break cases when granulas has Null in values due to backward compatibility
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Now KeyConditions can correctly skip nullable keys, including isNull and isNotNull. #12433 Cond.
PS. this pr also piggybacks a test for Field comparison. Let's see if CI complains.
-- Update
It seems Null comparison in Field.h can have the same semantic as in SQL.
Detailed description / Documentation draft:
Nothing special. NOTE, after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)