Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching in a json field path1.path2.path3:value should return a result #2312

Closed
fmassot opened this issue Nov 15, 2022 · 4 comments · Fixed by #2329
Closed

Searching in a json field path1.path2.path3:value should return a result #2312

fmassot opened this issue Nov 15, 2022 · 4 comments · Fixed by #2329
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@fmassot
Copy link
Collaborator

fmassot commented Nov 15, 2022

Currently this does not work as tantivy is doing something slightly different when writing and when searching.

Let's take a document to index that is very common in the OpenTelemetry world:

{"resource": {"k8s.container.name": "prometheus"}}

The doc_mapping is as follows:

doc_mapping:
   - name: resource
      type: json

When writing the index, tantivy will interpret k8s.container.name as the field name and will store that as is along with the value (and the type).
When searching document matching the query resource.k8s.container.name:prometheus, quickwit will remove the resource part and give tantivy this term to match k8s.container.name:prometheus. The issue is that tantivy will interpret the dots as a separator and will build a term with an internal separator and this won't match what was written previously.

I suggest modifying a bit how tantivy writes the JSON terms by using the dots in field names to define segment path.

This approach is not perfect as this will allow mixing {"resource": {"k8s.container.name": "prometheus"}} and {"resource": {"k8s": {"container": {"name": "prometheus"}}}} but having dots in fieldnames is very common in the log world and it will be very painful to escape dots.

@fulmicoton what do you think?

@fmassot fmassot added the enhancement New feature or request label Nov 15, 2022
@fulmicoton
Copy link
Collaborator

fulmicoton commented Nov 15, 2022

resource.k8s\.container\.name:prometheus should work.

I agree it is tempting to have resource.k8s.container.name:prometheus but it has some downside (ambiguity/shadowing).

@fmassot
Copy link
Collaborator Author

fmassot commented Nov 15, 2022

I tried to escape the dot but... it does not seem to work. When looking at tantivy code, I saw this json_path.split('.') so it will still split on the dot. Or I may be missing something.

https://github.com/quickwit-oss/tantivy/blob/9a090ed994708c220e574f1bf9ad05ce419c10a8/src/indexer/json_term_writer.rs#L272

@fulmicoton
Copy link
Collaborator

we can call it a bug then :)

@fulmicoton fulmicoton self-assigned this Nov 15, 2022
@fulmicoton fulmicoton added the bug Something isn't working label Nov 15, 2022
@fulmicoton
Copy link
Collaborator

Fix in quickwit-oss/tantivy#1682

fulmicoton added a commit that referenced this issue Nov 16, 2022
Also updated documentation, to explain how nested structure can be
searched.

Closes #2312
fulmicoton added a commit that referenced this issue Nov 17, 2022
* Update tantivy to fix json path search escaping '.'

Also updated documentation, to explain how nested structure can be
searched.

Closes #2312

* Update docs/reference/query-language.md

Co-authored-by: François Massot <francois.massot@gmail.com>

Co-authored-by: François Massot <francois.massot@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants