Using runtime fields as fallback rather than shadowing mapped fields #86536

Mpdreamz · 2022-05-08T12:17:37Z

Description

Today runtime field at query time always overwrite any mapped values.

https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-override-values.html

Quite often you might actually want to do the reverse:

Introduce a new field to be index say myfield in the mapping and start ingesting documents containing myfield explicitly.

We now want to query and aggregate based on myfield and use runtime fields to provide a fallback for myfield.

This would allow us to phase in a new field and only pay for the runtime field execution on older documents that do not contain myfield.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-05-12T16:38:04Z

Pinging @elastic/es-search (Team:Search)

javanna · 2022-06-15T08:44:29Z

When we developed runtime fields we envisioned adding the new indexed field at the rollover, so that the new index always has it indexed, while older indices expose its runtime field variant. We were planning on having a high-level API that would allow users to define the new field in a single call at the data streams level, see #72142.

I believe what you are suggesting is a mixed approach within the same index: use the indexed field when available, otherwise load the runtime field. I can see where this comes from: expose the faster variant of the field if you have it, not the other way around! When a field is defined under properties, it still gets indexed although it will always be shadowed at search time when there's a field with same name defined in the runtime section. Rather than changing how shadowing works and add complexity (we already got feedback that shadowing is complicated to follow), I would consider exposing the runtime field for all documents of the index, but have logic in its script that loads values from doc_values when available for the more recent documents that have the field, and compute it otherwise for the older documents that don't have doc_values for the new field. In practice, a script that loads from doc_values is pretty fast, and in this specific case we can take advantage of the flexibility of a script and incorporate the loading logic in it.

javanna · 2022-06-15T08:55:07Z

Now, one problem I see with what I suggested above is something that we've been aware of for some time but I tend to forget: there is currently no way to access the indexed variant of the field from a script if it is shadowed, as referring to it will always load the runtime field, and in this case it would even cause an error around trying to access the same field that you're defining a script for. If the path I am suggesting makes sense, we need to work on making it possible to access a shadowed field explicitly.

javanna · 2022-07-05T15:23:11Z

I've been thinking about this and I have a couple of questions. A runtime field can be defined in the mappings or in the search request. At query time, search request defined runtime fields have the precedence over runtime fields or ordinary fields with the same name defined in the mappings. At the same time, runtime fields defined in the mappings have the precedence over fields defined under properties.

Runtime fields defined in the search request are transient and can only be defined globally, meaning they will be applied to all indices that the query targets and will end up shadowing any existing field with same name defined in the mappings.

Runtime fields defined in the mappings are permanent (although they can be removed), and are defined at the index level.

I made an assumption on my above comment, but maybe you can confirm this @Mpdreamz . When you opened this issue, did you have a scenario in mind where you wanted to create a transient runtime field fallback across multiple indices (presumably the most recent index has the field indexed, while it is computed as a runtime field for previous indices), or was the issue around returning the computed value for individual documents that don't have a value for a certain field, while using the indexed field when present, but within the same index?

The high-level usecase that we had in mind for shadowing when we introduced it was correcting existing fields indexed with wrong values, which is why runtime fields currently take the precedence. I am not sure that we can simply change the precedence as that would make correcting existing fields not possible, but there are other ways that we could address the scenario that you brought up and I am keen on discussing that with the team.

felixbarny · 2022-07-06T06:29:14Z

I think there are good use cases for both options - runtime fields take precedence and indexed fields take precedence.

I am not sure that we can simply change the precedence as that would make correcting existing fields not possible

What about a parameter for runtime fields that let you define the precedence? The default would be the same behavior as today but then there's a mode where indexed fields take precedence.

javanna · 2022-07-06T08:07:19Z

I think there are good use cases for both options

True, although I would like to get more context around real-life scenarios. The problem I currently see with shadowing is that it is hard for users to follow and it is not clear if anybody is taking advantage of it for the usecase that we originally had in mind. I am not convinced about making the API even more complex with an additional configurable option, but there are other ways to address the need for falling back to runtime fields. Before discussing how (with the team as well) I would like to better understand the need. It mostly boils down to: do you need to add a new field on the current index or can you add it as you rollover to the next one?

felixbarny · 2022-07-06T08:23:30Z

The use cases I and AFAIK @Mpdreamz has in mind are to introduce new fields that the UI can rely on in a backwards compatible way.

A concrete example:

Currently, in APM, we store the duration of a transaction in the field transaction.duration.us. However, we'd like to transition to the official ECS field event.duration. We can start populating the new field in new versions of APM Server and make new versions of the APM UI/Kibana read from event.duration. However, there will still be a lot of historical data in the traces-* data stream that was ingested by older versions of the stack so that the documents only have the transaction.duration.us field, not event.duration. We were thinking of using runtime fields to add a layer of backwards compatibility: The UI would be able to always use the field event.duration for queries and to display the transaction duration. For new data, the indexed field event.duration would be used for that. For old data, we'd use a runtime field definition for event.duration that returns values from transaction.duration.us.

I suppose a workaround would be to create a runtime field for event.duration to existing backing indices of a data stream, then roll over the data stream after APM Server has been updated. The downside is that it would potentially create more shards due to the rollover. But if we only roll over at most once when updating the stack, and only if new fields have been introduced, it shouldn't be too frequent. @Mpdreamz do you think that would be feasible?

javanna · 2022-07-07T08:46:14Z

Thanks for providing more context. The need to rollover was what we initially envisioned when we were thinking of the "add a new field to a data stream" scenario. If you add the field to the index mappings, old indices can have a script that computes its value. The lack of a high-level API to easily do this remains, but Elasticsearch should already have the needed low-level primitives to get this done.

Issues arise if you 1) try to create a transient runtime field fallback from the search request, as you can't control which indices the runtime field gets applied to, 2) add a new field without rolling over as the runtime field will override the indexed field. Both of these can be solved one way or another, but I want to first clarify what it is that we want to achieve high-level.

javanna · 2022-08-03T16:19:03Z

I opened #89093 that would make it possible to have a runtime field expose the original value for the indexed field that it shadows, so that its script can have logic that pulls the indexed value when present or compute it otherwise. This should help introducing a new indexed field without requiring a rollover. Based on the conversation we had in this issue, I believe that that is all that's needed to achieve the desired goal to introduce a new indexed field and leverage its speed while computing values for older documents that don't have the field, hence I am closing this issue in favour of #89093 .

Mpdreamz added >enhancement needs:triage Requires assignment of a team area label labels May 8, 2022

Mpdreamz changed the title ~~Using runtime fields as fallback rather then shadowing mapped fields~~ Using runtime fields as fallback rather than shadowing mapped fields May 9, 2022

gwbrown added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels May 12, 2022

elasticmachine added the Team:Search Meta label for search team label May 12, 2022

javanna mentioned this issue Jun 17, 2022

Allow composite runtime fields to add top level fields #87690

Open

felixbarny mentioned this issue Jul 5, 2022

Automatically load unmapped fields from _source? #81357

Open

javanna added the team-discuss label Jul 5, 2022

javanna closed this as completed Aug 3, 2022

javanna removed the team-discuss label Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using runtime fields as fallback rather than shadowing mapped fields #86536

Using runtime fields as fallback rather than shadowing mapped fields #86536

Mpdreamz commented May 8, 2022

elasticmachine commented May 12, 2022

javanna commented Jun 15, 2022 •

edited

Loading

javanna commented Jun 15, 2022

javanna commented Jul 5, 2022

felixbarny commented Jul 6, 2022

javanna commented Jul 6, 2022

felixbarny commented Jul 6, 2022

javanna commented Jul 7, 2022

javanna commented Aug 3, 2022

Using runtime fields as fallback rather than shadowing mapped fields #86536

Using runtime fields as fallback rather than shadowing mapped fields #86536

Comments

Mpdreamz commented May 8, 2022

Description

elasticmachine commented May 12, 2022

javanna commented Jun 15, 2022 • edited Loading

javanna commented Jun 15, 2022

javanna commented Jul 5, 2022

felixbarny commented Jul 6, 2022

javanna commented Jul 6, 2022

felixbarny commented Jul 6, 2022

javanna commented Jul 7, 2022

javanna commented Aug 3, 2022

javanna commented Jun 15, 2022 •

edited

Loading