-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix agg split_size parameter, add docs and test #4627
Conversation
// Forward split_size (also shard_size) parameter to segment_size, as | ||
// this is the same in context of Quickwit | ||
if let Some(split_size) = &terms_agg.split_size { | ||
terms_agg.segment_size = Some(*split_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. So we have an argument called split_size
introduced in tantivy even though the parameter makes no sense there.
And it is unused... So you are adding glue code to actually have us write split_size
into segment_size
?
Can't we just remove the split_size
property in tantivy and populate the segment_size
thing directly?
Is this a serialization issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split_size
which aliases to shard_size
is supposed to set the number of terms that get send from a node to the root node. That part is not handled in tantivy and therefore unused, but would still make sense in the request structure.
Since we have just one segment in Quickwit currently, we can set it to segment_size
.
We could alias shard_size
directly to segment_size
, but that would be kind of incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of a better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could truncate the results to split_size
before sending the results between nodes, independently of segment_size
. But I think we can just forward the parameter for now, until we have more insight how split_size
and segment_size
should differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split_size which aliases to shard_size is supposed to set the number of terms that get send from a node to the root node. That part is not handled in tantivy and therefore unused, but would still make sense in the request structure.
Since we have just one segment in Quickwit currently, we can set it to segment_size.
That makes no sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending data between nodes is not handled in tantivy, it's still obviously useful to control that with a parameter.
No idea why you would think that is nonsense.
docs/reference/aggregation.md
Outdated
@@ -532,15 +532,22 @@ By default, the top 10 terms with the most documents are returned. Larger values | |||
|
|||
###### **split_size** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should only call it shard_size
?
docs/reference/aggregation.md
Outdated
@@ -532,15 +532,22 @@ By default, the top 10 terms with the most documents are returned. Larger values | |||
|
|||
###### **split_size** | |||
|
|||
The get more accurate results, we fetch more than size from each segment/split. | |||
The get more accurate results, we fetch more than size from each segment/split. Aliases to `shard_size`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get more accurate results, we fetch more than size from each segment/split. Aliases to `shard_size`. | |
The get more accurate results, we fetch more than size from each segment/split. | |
The parameter also admits `split_size` as an alias for opensearch/elasticsearch compatibility reasons. |
fix aggregation parameter split_size propagation improve split_size docs add split_size parameter test
rename split_size to shard_size update docs remove variable propagation
key: nice | ||
- doc_count: 2 | ||
key: cool | ||
doc_count_error_upper_bound: 0 | ||
sum_other_doc_count: 0 | ||
--- | ||
# Test term aggs with split_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shard
# Test term aggs with split_size | |
# Test term aggs with shard_size |
@PSeitz Don't we need to update the tantivy version for the test to pass, yet the tests are passing how did that work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to update the tantivy version, or was this done in a different PR?
Yes, that was done in a different PR |
fix aggregation parameter
split_size
propagation tosegment_size
improve split_size docs
add split_size parameter test