Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullable primary key with correct KeyCondition #12455

Merged
merged 3 commits into from
Jul 18, 2021

Conversation

amosbird
Copy link
Collaborator

@amosbird amosbird commented Jul 13, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Now KeyConditions can correctly skip nullable keys, including isNull and isNotNull. #12433 Cond.

PS. this pr also piggybacks a test for Field comparison. Let's see if CI complains.

-- Update

It seems Null comparison in Field.h can have the same semantic as in SQL.

Detailed description / Documentation draft:

Nothing special. NOTE, after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird
Copy link
Collaborator Author

@zlobober btw, your NULL_LAST logic has gone (I don't know why). I've added it back https://github.com/ClickHouse/ClickHouse/pull/12455/files#diff-043ede2be3d33df96a1f9b9b355678ebL89

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 13, 2020
@amosbird amosbird force-pushed the npc branch 3 times, most recently from abc2ba5 to cbafe08 Compare July 14, 2020 02:57
@amosbird
Copy link
Collaborator Author

After futher thoughts, two special Field types are added (-Inf and +Inf) which explicitly helps special comparisons such as NULL_LAST. Field Null should not be used as comparison argument.

@amosbird amosbird force-pushed the npc branch 2 times, most recently from f081a12 to 345db99 Compare July 14, 2020 13:37
@Akazz Akazz self-assigned this Jul 14, 2020
@amosbird amosbird force-pushed the npc branch 4 times, most recently from bfb1212 to b820ace Compare July 21, 2020 02:35
@alexey-milovidov alexey-milovidov added the st-waiting-for-fix We are waiting for fixes in this issue or pull request label Jul 21, 2020
@amosbird amosbird force-pushed the npc branch 2 times, most recently from e257ee2 to 0824256 Compare July 28, 2020 15:37
@zlobober
Copy link
Contributor

@amosbird, Sorry for the late reply. I like idea of introducing special fields '-inf' and '+inf' very much, it would make much easier expressing open/half-open intervals and rays in some situations. Do you have any plans on finishing this PR?

To make it clear: I am not a ClickHouse maintainer, I am developing an internal Yandex software using ClickHouse as computational engine. Your proposed change could actually help in my case :)

@amosbird
Copy link
Collaborator Author

amosbird commented Jul 31, 2020

@zlobober Hi, this pr is already finished. I've covered almost all cases related to Nullable primary keys. It's currently waiting for review.

@azat
Copy link
Collaborator

azat commented Aug 1, 2020

FWIW I've applied this PR into my build and I did not find any problems yet

src/Core/Field.h Outdated
Comment on lines 244 to 257
NegativeInfinity = 1,
PositiveInfinity = 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amosbird can you update your PR after recent changes?

2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:462:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34         switch (which)
2020-09-03 05:55:34                 ^
2020-09-03 05:55:34 /ClickHouse/src/Core/Field.h:497:17: error: enumeration values 'Null', 'NegativeInfinity', and 'PositiveInfinity' not handled in switch [-Werror,-Wswitch]
2020-09-03 05:55:34         switch (which)
2020-09-03 05:55:34                 ^

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@azat
Copy link
Collaborator

azat commented Oct 13, 2020

@alexey-milovidov what is the status of this PR? Are there some concerns about it?

@alexey-milovidov
Copy link
Member

@azat I did not review this PR and no one from my team decided to take it.

@alexey-milovidov
Copy link
Member

Yandex projects require manual fixing from our side due to changed interfaces...

@amosbird
Copy link
Collaborator Author

Weird CI errors...

@azat
Copy link
Collaborator

azat commented Apr 20, 2021

after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird BTW right now if you will run newer clickhouse and then rollback to the version w/o this patch you will not get any errors, but simply index analysis will work incorrectly (for Nullable), maybe it worth adding some guard against this?

@amosbird
Copy link
Collaborator Author

after creating a MergeTree table with nullable primary keys, ClickHouse won't be able to rollabck to previous versions. (potential backward incompatibility)

@amosbird BTW right now if you will run newer clickhouse and then rollback to the version w/o this patch you will not get any errors, but simply index analysis will work incorrectly (for Nullable), maybe it worth adding some guard against this?

Hmm, what kind of guards do you suggest?

@azat
Copy link
Collaborator

azat commented Apr 21, 2021

Hmm, what kind of guards do you suggest?

I guess new setting for MergeTree will be enough

@alexey-milovidov alexey-milovidov self-assigned this Jun 21, 2021
@alexey-milovidov alexey-milovidov merged commit b52411a into ClickHouse:master Jul 18, 2021
Comment on lines -41 to +43

if (!type->isNullable())
{
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
}
else
{
bool is_null = hyperrectangle[i].left.isNull() || hyperrectangle[i].right.isNull(); // one is enough
writeBinary(is_null, ostr);
if (!is_null)
{
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
}
}
serialization->serializeBinary(hyperrectangle[i].left, ostr);
serialization->serializeBinary(hyperrectangle[i].right, ostr);
Copy link
Collaborator

@azat azat Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks on disk format (and the relevant change in the MergeTreeIndexGranuleMinMax::deserializeBinary)

Too bad that I checked this patchset only for compatibility after reverting this patch - #12455 (comment) (use case: I've applied it manually, then revert it, and data skipping indexes over Nullable column had been broken)

But this patchset actually breaks compatibility with older versions of clickhouse for Nullable data skipping indexes after simple upgrade:

Here is a simple reproducer:

--
-- run this with 21.6 or similar (i.e. w/o this patch)
--

CREATE TABLE data
(
    `key` Int,
    `value` Nullable(Int),
    INDEX value_index value TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY key;

INSERT INTO data SELECT
    number,
    number
FROM numbers(10000);

SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; 
-- now upgrade and run the query again
SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

And it will fail because of on disk format changes:

$ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx
-rw-r----- 1 azat azat 36  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
-rw-r----- 1 azat azat 37  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

$ md5sum data/*/data/all_1_1_0/skp*.idx
a19c95c4a14506c65665a1e30ab404bf  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
e50e2fcfa873b232196623d56ab26105  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

Nice that there is no stable release with this patch included yet.
But personally I, don't use stable releases and how I wish this patchset created new on disk format...

Note that you may create data skipping indexes over Nullable column even before this patchset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind sending a PR restoring old on disk format?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azat added a commit to azat/ClickHouse that referenced this pull request Aug 4, 2021
[1] breaks on disk format (and the relevant change in the:

  [1]: ClickHouse#12455 (comment)

Too bad that I checked this patchset only for compatibility after
reverting this patch [2] (use case: I've applied it manually, then
revert it, and data skipping indexes over Nullable column had been
broken)

  [2]: ClickHouse#12455 (comment)

But this patchset actually breaks compatibility with older versions of
clickhouse for Nullable data skipping indexes after simple upgrade:

Here is a simple reproducer:

    --
    -- run this with 21.6 or similar (i.e. w/o this patch)
    --

    CREATE TABLE data
    (
        `key` Int,
        `value` Nullable(Int),
        INDEX value_index value TYPE minmax GRANULARITY 1
    )
    ENGINE = MergeTree
    ORDER BY key;

    INSERT INTO data SELECT
        number,
        number
    FROM numbers(10000);

    SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

Now upgrade and run the query again:

    SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1;

And it will fail because of on disk format changes:

    $ ll --time-style=+ data/*/data/all_1_1_0/skp*.idx
    -rw-r----- 1 azat azat 36  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
    -rw-r----- 1 azat azat 37  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

    $ md5sum data/*/data/all_1_1_0/skp*.idx
    a19c95c4a14506c65665a1e30ab404bf  data/with_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx
    e50e2fcfa873b232196623d56ab26105  data/without_nullable_patch/data/all_1_1_0/skp_idx_value_index.idx

Note, that there is no stable release with this patch included yet, so
no need to backport.

Also note that you may create data skipping indexes over Nullable
column even before [3].

  [3]: ClickHouse#12455

v2: break cases when granulas has Null in values due to backward
compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants