-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queryable object fields #25312
Comments
We could also index terms with and without the field prefix, (eg |
I wonder if you could make this more transparent so the normal thing works:
The prefixing would be an implementation detail of |
I think we should not make it complicated restrictive. It's very simple, we decide how to index and make things work without any special type. you just mark the mapping as
I don't get what you mean here, I assume you want to barf on numerics? I think we should just use make text out of it and be done with it. We can add some position inc gaps to make phrases work but from this perspective it's really just one big pile of text.
you know they can't use any kind of numeric special things here since we won't support aggregations etc. Let's keep it simple and don't have many exceptions. That's what I'd like though. |
That is what I mean. The way I read the original proposal I thought that we would automatically give numerics their own Lucene field somehow and I didn't like that. I agree that we shouldn't do numerics. In Clint's example you can pull out numeric fields. I wonder if we can exclude those fields from the indexed object fields. Could we make queries to the indexed objects look "normal" like I had in my example? I think it'd be nice to have an example of configuring the field type of the index - whether the strings are analyzed like text or keyword and how you'd set up multifields. And setting up the analyzer/normalizer/etc. |
we are on the same page here. let not be smart but simple |
Discussed in FixItFriday. We're going to start simple and see what feedback we get. We will index each term alone plus with a path prefix (eg |
This possibly has a nice use-case in APM. We allow users to send up big blobs of custom objects which are not currently indexed. This would be a nice way to make those documents searchable without running the risk of field explosion (Thanks for the ping @ruflin!) |
cc @elastic/es-search-aggs |
+1 to @roncohen's comment about handling a potentially unbounded number of unique field names. I've come across a related use case in previous experience: a spreadsheet program where users can create sheet templates with arbitrary column names, and want to be able to search within columns by name. |
I’m now getting started on this in earnest. My main open question is whether it makes sense to add this functionality to objects, as opposed to creating a new data type as @nik9000 alluded to. Under the current proposal, an object field would be made queryable as follows:
There are a few issues to ponder with this approach. First, it’s a bit subtle that setting Additionally, mixing in concrete field mappings can make the behavior less clear:
Do we still index the un-prefixed values for Finally, this syntax looks tricky to support given how the mapping + document parsing code is currently designed. In particular, an object mapper must now also function as a field mapper in certain contexts. To avoid these problems, I wonder if it would better to create a new field type, something similar to the following:
This directly covers the use cases around handling opaque blobs of data. If certain important keys are known in advance (and should be made available for aggregations, etc.), they can be pulled into a separate field, with no special relation to the object field. We could maybe provide a mechanism similar to |
These are compelling arguments towards a dedicated indexed object field indeed. I guess we could still make it work on |
Or alternatively, we could prevent (both dynamic and explicit) mapping updates to |
We had a discussion offline and decided to create a new leaf field type for the reasons outlined above. As @jpountz mentioned we didn't think it made sense to add a new field mapping for each key, as this would not solve a major use case of the feature, which is to prevent mapping explosion. Other conclusions from the discussion can be found on the meta-issue: #33003 (comment) |
Would this allow to store objects with dots in their field names? On beats we have some cases were we would benefit of being able to store key-value objects (string to string) with dots in the keys, like in the subfields of
On queries, in principle, these fields would be used only for filtering. At the moment users face mapping errors when they try to store data like this, and the only workarounds they have are to replace the dots with other characters (we offer this "dedotting" in some places), or to rename and/or drop the conflicting fields. This is not a very good user experience (see this topic in discuss for example), and makes them to lose the original names of these labels, or to completely discard some of them. It'd be great if this new type could cover this case 🙂 |
Hi @jsoriano, as currently designed the new field type would support this sort of data. For example, Also, you're not suggesting this in your comment, but just to be really clear -- this field type shouldn't be used as a general approach to handling dots in field names. It supports a much more restrictive set of search functionality than normal fields, and should only be used if it's the right fit for the particular data. |
Hi, an update about its possible use in Beats after some conversations offline. I have started a PR (elastic/beats#9286) with the changes that would be needed, and after trying it a little bit it works quite well for our case. There are two main cons about using this type already on 7.0 for labels:
If at the end we don't use it for labels, we could still consider using this type for kubernetes annotations. We are not storing them by default now to avoid loads of dynamic field mappings, this type would help on this. And terms aggregation is less required there. I have also opened a discussion about the possible use of these fields in ECS (elastic/ecs#198). |
@jsoriano Aggregations will not be implemented for queryable object fields. I don't think you should make plans based on this field type, it is too different from normal fields and will never support a number of features that you would expect, eg discoverability of the existence of the field via an API. I think the correct way to deal with fields like:
is either to dedot them, or to rewrite That way, these fields end up benefiting from all the features already supported. |
Just a note that while aggregations are not planned for the first version of the feature, I don't think they're out of the question, and I'm investigating if it'd be possible to support some simple aggregation types like |
Often we have large object fields with many sub-fields, only a few of which are needed for aggregations, sorting, or highlighting. Today, we create fields for all sub-fields, but we could greatly reduce the number of required fields if we make object fields queryable.
We would need a specialiased analyzer which can accept JSON and transform an object like:
into the following terms:
Then you could search for
active
statuses with:or
We could possibly even support searching for "New York" vs "New City of York" with:
which would be rewritten as
my_object:"city:new city:york"
If we wanted to be able to aggregate on the
age
field, the object field could be mapped as:With this mapping, only the
my_object.age
sub-field would have its own Lucene field (or Elasticsearch field) and the rest of the object would be queryable via themy_object
field.This could even be made to work on the whole document by allowing the
_source
field to be configurable.The text was updated successfully, but these errors were encountered: