Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using runtime fields as fallback rather than shadowing mapped fields #86536

Closed
Mpdreamz opened this issue May 8, 2022 · 9 comments
Closed

Using runtime fields as fallback rather than shadowing mapped fields #86536

Mpdreamz opened this issue May 8, 2022 · 9 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@Mpdreamz
Copy link
Member

Mpdreamz commented May 8, 2022

Description

Today runtime field at query time always overwrite any mapped values.

https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-override-values.html

Quite often you might actually want to do the reverse:

Introduce a new field to be index say myfield in the mapping and start ingesting documents containing myfield explicitly.

We now want to query and aggregate based on myfield and use runtime fields to provide a fallback for myfield.

This would allow us to phase in a new field and only pay for the runtime field execution on older documents that do not contain myfield.

@Mpdreamz Mpdreamz added >enhancement needs:triage Requires assignment of a team area label labels May 8, 2022
@Mpdreamz Mpdreamz changed the title Using runtime fields as fallback rather then shadowing mapped fields Using runtime fields as fallback rather than shadowing mapped fields May 9, 2022
@gwbrown gwbrown added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels May 12, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna
Copy link
Member

javanna commented Jun 15, 2022

When we developed runtime fields we envisioned adding the new indexed field at the rollover, so that the new index always has it indexed, while older indices expose its runtime field variant. We were planning on having a high-level API that would allow users to define the new field in a single call at the data streams level, see #72142.

I believe what you are suggesting is a mixed approach within the same index: use the indexed field when available, otherwise load the runtime field. I can see where this comes from: expose the faster variant of the field if you have it, not the other way around! When a field is defined under properties, it still gets indexed although it will always be shadowed at search time when there's a field with same name defined in the runtime section. Rather than changing how shadowing works and add complexity (we already got feedback that shadowing is complicated to follow), I would consider exposing the runtime field for all documents of the index, but have logic in its script that loads values from doc_values when available for the more recent documents that have the field, and compute it otherwise for the older documents that don't have doc_values for the new field. In practice, a script that loads from doc_values is pretty fast, and in this specific case we can take advantage of the flexibility of a script and incorporate the loading logic in it.

@javanna
Copy link
Member

javanna commented Jun 15, 2022

Now, one problem I see with what I suggested above is something that we've been aware of for some time but I tend to forget: there is currently no way to access the indexed variant of the field from a script if it is shadowed, as referring to it will always load the runtime field, and in this case it would even cause an error around trying to access the same field that you're defining a script for. If the path I am suggesting makes sense, we need to work on making it possible to access a shadowed field explicitly.

@javanna
Copy link
Member

javanna commented Jul 5, 2022

I've been thinking about this and I have a couple of questions. A runtime field can be defined in the mappings or in the search request. At query time, search request defined runtime fields have the precedence over runtime fields or ordinary fields with the same name defined in the mappings. At the same time, runtime fields defined in the mappings have the precedence over fields defined under properties.

Runtime fields defined in the search request are transient and can only be defined globally, meaning they will be applied to all indices that the query targets and will end up shadowing any existing field with same name defined in the mappings.

Runtime fields defined in the mappings are permanent (although they can be removed), and are defined at the index level.

I made an assumption on my above comment, but maybe you can confirm this @Mpdreamz . When you opened this issue, did you have a scenario in mind where you wanted to create a transient runtime field fallback across multiple indices (presumably the most recent index has the field indexed, while it is computed as a runtime field for previous indices), or was the issue around returning the computed value for individual documents that don't have a value for a certain field, while using the indexed field when present, but within the same index?

The high-level usecase that we had in mind for shadowing when we introduced it was correcting existing fields indexed with wrong values, which is why runtime fields currently take the precedence. I am not sure that we can simply change the precedence as that would make correcting existing fields not possible, but there are other ways that we could address the scenario that you brought up and I am keen on discussing that with the team.

@felixbarny
Copy link
Member

I think there are good use cases for both options - runtime fields take precedence and indexed fields take precedence.

I am not sure that we can simply change the precedence as that would make correcting existing fields not possible

What about a parameter for runtime fields that let you define the precedence? The default would be the same behavior as today but then there's a mode where indexed fields take precedence.

@javanna
Copy link
Member

javanna commented Jul 6, 2022

I think there are good use cases for both options

True, although I would like to get more context around real-life scenarios. The problem I currently see with shadowing is that it is hard for users to follow and it is not clear if anybody is taking advantage of it for the usecase that we originally had in mind. I am not convinced about making the API even more complex with an additional configurable option, but there are other ways to address the need for falling back to runtime fields. Before discussing how (with the team as well) I would like to better understand the need. It mostly boils down to: do you need to add a new field on the current index or can you add it as you rollover to the next one?

@felixbarny
Copy link
Member

The use cases I and AFAIK @Mpdreamz has in mind are to introduce new fields that the UI can rely on in a backwards compatible way.

A concrete example:

Currently, in APM, we store the duration of a transaction in the field transaction.duration.us. However, we'd like to transition to the official ECS field event.duration. We can start populating the new field in new versions of APM Server and make new versions of the APM UI/Kibana read from event.duration. However, there will still be a lot of historical data in the traces-* data stream that was ingested by older versions of the stack so that the documents only have the transaction.duration.us field, not event.duration. We were thinking of using runtime fields to add a layer of backwards compatibility: The UI would be able to always use the field event.duration for queries and to display the transaction duration. For new data, the indexed field event.duration would be used for that. For old data, we'd use a runtime field definition for event.duration that returns values from transaction.duration.us.

I suppose a workaround would be to create a runtime field for event.duration to existing backing indices of a data stream, then roll over the data stream after APM Server has been updated. The downside is that it would potentially create more shards due to the rollover. But if we only roll over at most once when updating the stack, and only if new fields have been introduced, it shouldn't be too frequent. @Mpdreamz do you think that would be feasible?

@javanna
Copy link
Member

javanna commented Jul 7, 2022

Thanks for providing more context. The need to rollover was what we initially envisioned when we were thinking of the "add a new field to a data stream" scenario. If you add the field to the index mappings, old indices can have a script that computes its value. The lack of a high-level API to easily do this remains, but Elasticsearch should already have the needed low-level primitives to get this done.

Issues arise if you 1) try to create a transient runtime field fallback from the search request, as you can't control which indices the runtime field gets applied to, 2) add a new field without rolling over as the runtime field will override the indexed field. Both of these can be solved one way or another, but I want to first clarify what it is that we want to achieve high-level.

@javanna
Copy link
Member

javanna commented Aug 3, 2022

I opened #89093 that would make it possible to have a runtime field expose the original value for the indexed field that it shadows, so that its script can have logic that pulls the indexed value when present or compute it otherwise. This should help introducing a new indexed field without requiring a rollover. Based on the conversation we had in this issue, I believe that that is all that's needed to achieve the desired goal to introduce a new indexed field and leverage its speed while computing values for older documents that don't have the field, hence I am closing this issue in favour of #89093 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants