Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38457: [C++] Support LogicalNullCount for DictionaryArray #38681

Merged
merged 27 commits into from
Nov 27, 2023

Conversation

R-JunmingChen
Copy link
Contributor

@R-JunmingChen R-JunmingChen commented Nov 12, 2023

Rationale for this change

Currently, the null_count only returns the number of null values in indices. We want to count the real nulls as the dictionary array is decoded.

What changes are included in this PR?

Add DictionaryMayHaveLogicalNulls and ComputeLogicalNullCount for DictionaryArray

Are these changes tested?

Yes

Are there any user-facing changes?

No.

Copy link

⚠️ GitHub issue #38457 has been automatically assigned in GitHub to PR creator.

@R-JunmingChen R-JunmingChen marked this pull request as ready for review November 12, 2023 16:27
@kou kou changed the title GH-38457: [C++]Real null count for DictionaryArray GH-38457: [C++] Real null count for DictionaryArray Nov 12, 2023
cpp/src/arrow/array/array_dict_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 12, 2023
R-JunmingChen and others added 4 commits November 13, 2023 09:02
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 13, 2023
@R-JunmingChen R-JunmingChen requested a review from kou November 13, 2023 01:25
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Nov 13, 2023
@R-JunmingChen
Copy link
Contributor Author

Hi, @pitrou @bkietz, please review this PR when you have time

@jorisvandenbossche
Copy link
Member

#34408 added a ComputeLogicalNullCount method on the base Array, and implemented that for Union / REE. Maybe that could be expanded for dictionary types as well, so we can reuse that existing method here?

@R-JunmingChen
Copy link
Contributor Author

R-JunmingChen commented Nov 13, 2023

#34408 added a ComputeLogicalNullCount method on the base Array, and implemented that for Union / REE. Maybe that could be expanded for dictionary types as well, so we can reuse that existing method here?

Hi, it's good to merge this to ComputeLogicalNullCount for user experiences. But there is also a setback. If we do that, we can't return IndexError when the bound check fails since the ComputeLogicalNullCount returns int64_t instead of Result<int64_t>. Is that ok we just skip instead of throwing when the bound check fails?

@pitrou
Copy link
Member

pitrou commented Nov 13, 2023

Well, bounds checking is not necessary since the dictionary array is presumed to be valid already.

@R-JunmingChen
Copy link
Contributor Author

Ok, let's do this. I will implement it ASAP

@R-JunmingChen R-JunmingChen changed the title GH-38457: [C++] Real null count for DictionaryArray GH-38457: [C++] Support LogicCount for DictionaryArray Nov 15, 2023
@R-JunmingChen
Copy link
Contributor Author

R-JunmingChen commented Nov 15, 2023

Hi, I have updated to LogicalCount version. Please review the changes when you have time.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few changes

cpp/src/arrow/array/array_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/dict_util.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Nov 15, 2023
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 16, 2023
@R-JunmingChen R-JunmingChen requested a review from bkietz November 16, 2023 01:47
@jorisvandenbossche jorisvandenbossche changed the title GH-38457: [C++] Support LogicCount for DictionaryArray GH-38457: [C++] Support LogicalNullCount for DictionaryArray Nov 16, 2023
@R-JunmingChen
Copy link
Contributor Author

ping @bkietz and @pitrou, please review when you have time

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Nov 27, 2023
@bkietz bkietz merged commit b1f1ef4 into apache:main Nov 27, 2023
1 check passed
@bkietz bkietz removed the awaiting merge Awaiting merge label Nov 27, 2023
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b1f1ef4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…pache#38681)

### Rationale for this change
Currently, the `null_count` only returns the number of null values in `indices`. We want to count the real nulls as the dictionary array is decoded.

### What changes are included in this PR?
Add `DictionaryMayHaveLogicalNulls` and `ComputeLogicalNullCount` for DictionaryArray

### Are these changes tested?
Yes

### Are there any user-facing changes?
No.
* Closes: apache#38457 

Lead-authored-by: Junming Chen <junming.chen.r@outlook.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Real null count for DictionaryArray
5 participants