-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using runtime fields as fallback rather than shadowing mapped fields #86536
Comments
Pinging @elastic/es-search (Team:Search) |
When we developed runtime fields we envisioned adding the new indexed field at the rollover, so that the new index always has it indexed, while older indices expose its runtime field variant. We were planning on having a high-level API that would allow users to define the new field in a single call at the data streams level, see #72142. I believe what you are suggesting is a mixed approach within the same index: use the indexed field when available, otherwise load the runtime field. I can see where this comes from: expose the faster variant of the field if you have it, not the other way around! When a field is defined under properties, it still gets indexed although it will always be shadowed at search time when there's a field with same name defined in the runtime section. Rather than changing how shadowing works and add complexity (we already got feedback that shadowing is complicated to follow), I would consider exposing the runtime field for all documents of the index, but have logic in its script that loads values from doc_values when available for the more recent documents that have the field, and compute it otherwise for the older documents that don't have doc_values for the new field. In practice, a script that loads from doc_values is pretty fast, and in this specific case we can take advantage of the flexibility of a script and incorporate the loading logic in it. |
Now, one problem I see with what I suggested above is something that we've been aware of for some time but I tend to forget: there is currently no way to access the indexed variant of the field from a script if it is shadowed, as referring to it will always load the runtime field, and in this case it would even cause an error around trying to access the same field that you're defining a script for. If the path I am suggesting makes sense, we need to work on making it possible to access a shadowed field explicitly. |
I've been thinking about this and I have a couple of questions. A runtime field can be defined in the mappings or in the search request. At query time, search request defined runtime fields have the precedence over runtime fields or ordinary fields with the same name defined in the mappings. At the same time, runtime fields defined in the mappings have the precedence over fields defined under properties. Runtime fields defined in the search request are transient and can only be defined globally, meaning they will be applied to all indices that the query targets and will end up shadowing any existing field with same name defined in the mappings. Runtime fields defined in the mappings are permanent (although they can be removed), and are defined at the index level. I made an assumption on my above comment, but maybe you can confirm this @Mpdreamz . When you opened this issue, did you have a scenario in mind where you wanted to create a transient runtime field fallback across multiple indices (presumably the most recent index has the field indexed, while it is computed as a runtime field for previous indices), or was the issue around returning the computed value for individual documents that don't have a value for a certain field, while using the indexed field when present, but within the same index? The high-level usecase that we had in mind for shadowing when we introduced it was correcting existing fields indexed with wrong values, which is why runtime fields currently take the precedence. I am not sure that we can simply change the precedence as that would make correcting existing fields not possible, but there are other ways that we could address the scenario that you brought up and I am keen on discussing that with the team. |
I think there are good use cases for both options - runtime fields take precedence and indexed fields take precedence.
What about a parameter for runtime fields that let you define the precedence? The default would be the same behavior as today but then there's a mode where indexed fields take precedence. |
True, although I would like to get more context around real-life scenarios. The problem I currently see with shadowing is that it is hard for users to follow and it is not clear if anybody is taking advantage of it for the usecase that we originally had in mind. I am not convinced about making the API even more complex with an additional configurable option, but there are other ways to address the need for falling back to runtime fields. Before discussing how (with the team as well) I would like to better understand the need. It mostly boils down to: do you need to add a new field on the current index or can you add it as you rollover to the next one? |
The use cases I and AFAIK @Mpdreamz has in mind are to introduce new fields that the UI can rely on in a backwards compatible way. A concrete example: Currently, in APM, we store the duration of a transaction in the field I suppose a workaround would be to create a runtime field for |
Thanks for providing more context. The need to rollover was what we initially envisioned when we were thinking of the "add a new field to a data stream" scenario. If you add the field to the index mappings, old indices can have a script that computes its value. The lack of a high-level API to easily do this remains, but Elasticsearch should already have the needed low-level primitives to get this done. Issues arise if you 1) try to create a transient runtime field fallback from the search request, as you can't control which indices the runtime field gets applied to, 2) add a new field without rolling over as the runtime field will override the indexed field. Both of these can be solved one way or another, but I want to first clarify what it is that we want to achieve high-level. |
I opened #89093 that would make it possible to have a runtime field expose the original value for the indexed field that it shadows, so that its script can have logic that pulls the indexed value when present or compute it otherwise. This should help introducing a new indexed field without requiring a rollover. Based on the conversation we had in this issue, I believe that that is all that's needed to achieve the desired goal to introduce a new indexed field and leverage its speed while computing values for older documents that don't have the field, hence I am closing this issue in favour of #89093 . |
Description
Today runtime field at query time always overwrite any mapped values.
https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-override-values.html
Quite often you might actually want to do the reverse:
Introduce a new field to be index say
myfield
in the mapping and start ingesting documents containingmyfield
explicitly.We now want to query and aggregate based on
myfield
and use runtime fields to provide a fallback formyfield
.This would allow us to phase in a new field and only pay for the runtime field execution on older documents that do not contain
myfield
.The text was updated successfully, but these errors were encountered: