Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing parameter to term agg #2103

Merged
merged 6 commits into from
Aug 14, 2023
Merged

add missing parameter to term agg #2103

merged 6 commits into from
Aug 14, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Jun 26, 2023

missing parameter on term aggregation

Limitations:
Mixed types columns are not supported.
Missing key of type string is not supported on numerical fields

It requires a special missing aggregation to address those limitations

#1913
#1789

@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2023

Codecov Report

Merging #2103 (ca40f57) into main (3b0cbf8) will decrease coverage by 0.08%.
The diff coverage is 82.61%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main    #2103      +/-   ##
==========================================
- Coverage   94.39%   94.31%   -0.08%     
==========================================
  Files         321      321              
  Lines       60612    61262     +650     
==========================================
+ Hits        57213    57782     +569     
- Misses       3399     3480      +81     
Impacted Files Coverage Δ
columnar/src/column/mod.rs 82.30% <ø> (ø)
src/core/segment_reader.rs 90.57% <0.00%> (ø)
src/lib.rs 99.05% <ø> (ø)
columnar/src/column_index/merge/mod.rs 98.75% <50.00%> (ø)
columnar/src/columnar/merge/tests.rs 98.44% <50.00%> (-1.34%) ⬇️
columnar/src/tests.rs 97.89% <50.00%> (-1.23%) ⬇️
src/query/range_query/range_query_ip_fastfield.rs 97.23% <66.66%> (ø)
columnar/src/column_index/merge/shuffled.rs 99.12% <75.00%> (-0.88%) ⬇️
columnar/src/columnar/writer/mod.rs 99.12% <75.00%> (+<0.01%) ⬆️
src/aggregation/bucket/term_agg.rs 94.66% <82.31%> (-3.62%) ⬇️
... and 9 more

... and 25 files with indirect coverage changes

let new_col = Column {
index: columnar::ColumnIndex::Full,
// Replace u64::MAX later with actual term
values: col.0.first_or_default_col(u64::MAX),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is u64::MAX mapping to NaN? Why is this working?

bucket_agg_accessor
.column_block_accessor
.fetch_block(docs, &bucket_agg_accessor.accessor);
if let Some(missing) = bucket_agg_accessor.missing_accessor1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don''t understand what happens when we have two columns of type string and int
For instance, if we have 3 docs.

doc1 -> "hello"
doc2 -> 3
doc3 -> missing

And we want a term aggregation with missing = "N/A".
With logic below, don't we end up with

"hello" -> 1
3 -> 1
"N/A" -> 3 (instead of 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment for this on the aggregation request

@PSeitz PSeitz merged commit 2e10901 into main Aug 14, 2023
@PSeitz PSeitz deleted the missing_agg branch August 14, 2023 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants