-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for concatenated field type #4773
Conversation
docs/configuration/index-config.md
Outdated
Concatenate fields don't support fast fields, and are never stored. They uses their own tokenizer, independantly of the | ||
tokenizer configured on the individual fields. | ||
At query time, concatenate fields don't support range queries. | ||
Only the following types are supported inside a concatenate field: text, bool, i64, u64, json. Other types are reject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the following types are supported inside a concatenate field: text, bool, i64, u64, json. Other types are reject | |
Only the following types are supported inside a concatenate field: text, bool, i64, u64, json. Other types are rejected |
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
|
||
for value in json_obj_values { | ||
for concate_dynamic_field in &self.concate_dynamic_fields { | ||
document.add_field_value(*concate_dynamic_field, value.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know a simple trick to remove the last clone?
I haven't found anything simple in the library.
It feels sad if we have a single concate_dynamic_fields (the most common case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a good way to do that, so I created a custom iterator which does the trick (ZipCloneable
). Let me know if you think that's better or not
quickwit/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/mapping_tree.rs
Outdated
Show resolved
Hide resolved
text: val.clone(), | ||
tokens: vec![Token { | ||
offset_from: 0, | ||
offset_to: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've set the offset_to to 1, is that alright, or should that be text.len()
?
@@ -54,6 +55,124 @@ pub enum LeafType { | |||
Text(QuickwitTextOptions), | |||
} | |||
|
|||
fn value_to_pretokenized<T: ToString>(val: T) -> PreTokenizedString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a discussion somewhere about why we are going for the string route? I think we discussed it but forgot the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we had that discussion. I based my work on
For numeric types or dates, we should coerce them to string.
(from the ticket).
The rational I see would be that if we want to do point search (not range &co), casting everything to string is the simplest way, and work rather well. Maybe we could leverange json field instead, but I don't think there is much benefit, and you end up doing twice as many requests to storage when you don't know if a term is an integer or a string.
quickwit/quickwit-doc-mapper/src/default_doc_mapper/mapping_tree.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add unit tests.
@trinity-1686a I don't have more comments apart from the need for unit test. integration tests are not a replacement for UT. @PSeitz can you have a look at the CR? If you see optimization hanging fruit, and they can be shipped separately, I think we can revisit that later, but I want you to be aware of the addition of this PR. |
record: Option<IndexRecordOption>, | ||
) -> anyhow::Result<Self> { | ||
let text_index_options_opt = Self::from_parts_text(true, tokenizer, record, false)?; | ||
let text_index_options = text_index_options_opt.expect("concatenate field must be indexed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be triggered by user input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self::from_parts_text
the line above will always return Ok(Some(_)) if it's first parameter is true
quickwit/quickwit-doc-mapper/src/default_doc_mapper/mapping_tree.rs
Outdated
Show resolved
Hide resolved
for sub_field in &options.concatenate_fields { | ||
for matched_field in | ||
mapping_node | ||
.find_field_mapping_leaf(sub_field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for json field?
If I have a json field called attributes and I concat
attributes.color
, is this an error, or is not an error but working in strange ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the spec is acceptable, can we have a unit test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, it's an error at index creation time. In the future, I hope it may become possible to implement. I've added a commented-out test
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/default_mapper.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/mapping_tree.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-doc-mapper/src/default_doc_mapper/mapping_tree.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see minor comments
Description
fix #4548
implement concatenated field type: a field which search inside other fields, in a more efficient way than querying each sub-field independently. Currently doesn't support having date and ip sub-fields. These can be added later, but may require some trickery on the search path to get working properly.
How was this PR tested?
added integration test testing we use the right tokenizer, and are able to search for supported types (int, bool, text, things inside json field, and in dynamic field)