-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add mapping information for single-/multi-valued fields #16420
Labels
enhancement
Enhancement or improvement to existing feature or request
Indexing
Indexing, Bulk Indexing and anything related to indexing
Search:Query Capabilities
Comments
msfroh
added
enhancement
Enhancement or improvement to existing feature or request
untriaged
labels
Oct 22, 2024
github-actions
bot
added
the
Indexing
Indexing, Bulk Indexing and anything related to indexing
label
Oct 22, 2024
@msfroh We evaluated this feature as a part of triage meeting and this seems a nice feature to add in OpenSearch. Looking forward to more discussion on this. |
This would be useful for the SQL plugin. |
normanj-bitquill
added a commit
to Bit-Quill/OpenSearch
that referenced
this issue
Nov 8, 2024
* Can only be used for field types that support multiple values * If a field has the multivalued property, then new documents must have an array for its value Signed-off-by: Norman Jordan <norman.jordan@improving.com>
3 tasks
normanj-bitquill
added a commit
to Bit-Quill/OpenSearch
that referenced
this issue
Dec 5, 2024
* Can only be used for field types that support multiple values * If a field has the multivalued property, then new documents must have an array for its value Signed-off-by: Norman Jordan <norman.jordan@improving.com>
PR for adding support for adding the multivalued flag. |
@msfroh could u plz take a look ? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
Enhancement or improvement to existing feature or request
Indexing
Indexing, Bulk Indexing and anything related to indexing
Search:Query Capabilities
Is your feature request related to a problem? Please describe
Fields in an OpenSearch index are all allowed to be multivalued. Any
keyword
field will also accept an array ofkeyword
values. This all works, because under the hood, Lucene doesn't really make a distinction between adding a field once and adding it multiple times.Unfortunately, for cases that try to project a fixed schema (like the SQL plugin or the proposed join support in core), it's useful to make a distinction between a field that represents a single
keyword
and one that represents an array ofkeyword
s. We could treat every field as an array, but a lot of fields would come out as arrays of length 1 (since, at least in my experience, the majority of fields are single-valued).Describe the solution you'd like
It would be great if we could add a property in a mapping that conveys whether a field is single- or multi-valued. Unfortunately, from a backwards compatibility standpoint, we can't just add a new required property, since we would break all existing index mappings.
My suggestion is that we add an optional
multivalued
property for field mappings. Essentially, this property would have three possible values:true
, meaning the field should be treated as an array,false
, meaning that the field only has a single value -- a document with multiple values for the field will be rejected -- ornull
, meaning that we don't know. This means the field was dynamically added to the mapping or the field was specified in a mapping without a value for themultivalued
property.I would also suggest that if a document specifies multiple values for a field where
multivalued
isnull
, we should update the mapping to setmultivalued
totrue
. (Maybe we can't do that if dynamic mapping changes are disabled.)Going forward, if we add this property in OpenSearch 2.x, maybe we can make it mandatory for new indices created in OpenSearch 3.0. (Of course, we would still need to support the OpenSearch 2.x
null
behavior, at least until OpenSearch 4.0 is released.) Starting in OpenSearch 3.0, we could dynamically infer the property from the first document containing a given field (which would require a bit of work, since we would need to distinguish between"fieldA":"foo"
and"fieldA":["foo"]
, where the former would be single-valued and the latter would be multivalued).Related component
Indexing
Describe alternatives you've considered
I was chatting with @anirudha today about an idea of making it a search-time problem, since it's at search time that knowing the schema is useful (since indexing "just works" right now). Essentially, you could take a hint at search time to force an interpretation for a field.
You could also make a best effort to guess whether a field has multiple values by inspecting a sample of documents (the first 500?). Since you may want the coordinator to get a response from each shard with the same interpretation, you could do a preliminary search phase (kind of like
can_match
) to ask each shard to vote on the arity of each field. If any shard says a field is multivalued, we would interpret it as multivalued.Additional context
I'm categorizing this as "Indexing", but the property is mostly useful at search time. I think I'll add the "Search:Query Capabilities" label too.
The text was updated successfully, but these errors were encountered: