Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data.search.searchSource] Update SearchSource to use Fields API. #82383

Merged
merged 18 commits into from
Dec 3, 2020

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Nov 2, 2020

Closes #77241
Fixes #22897
Fixes #75813
Unblocks #80517 & #83891
Relates to #69049
Prior experiments: #77348 #75407

Summary

This PR integrates the new search fields API with the high-level search service SearchSource.

One of the main problems encountered in #75407 was that there wasn't an ergonomic way to add support for passing an array of fields to the fields API as SearchSource.setField('fields', ...); already existed.

This PR updates the API for .setField('fields', [...]) to pass values to the search fields API instead of retrieving values from source. For apps which want to keep the legacy behavior, .setField('fieldsFromSource', [...]) has been added.

To make it easier to play around with this feature, the "Search Examples" sample plugin has been rewritten to incorporate SearchSource, and provides a way to easily configure your request in the UI and then see the query DSL & response that is returned based on your configuration.

Screen Shot 2020-11-02 at 3 35 02 PM

Behavior

Here's a rough summary of the new behavior:

  • setField('fields', [...])
    • Source filters are applied first to remove any unneeded fields. Filters also get applied to stored/scripted fields.
    • Calls the search fields API
    • If you request any fields which are in the docvalues for indexPattern.getComputedFields, they will automatically be merged with the formats from the index pattern, unless you explicitly provide one.
    • You can pass ['*'] to get all fields
  • setField('fieldsFromSource', [...])
    • Mimics legacy behavior in that it will request the specified fields from source via _source.includes: [...]
    • Note that docvalue fields have still been removed: those will now be requested from the fields API even when using this legacy behavior, and if you include a field that's in the index pattern's computed docvalues, it will be added to the fields API. But any other fields provided in the list will be pulled from source.

Changes

  • Added fieldsFromSource to SearchSource & refactored existing fields logic to use fields API
  • I've updated apps that were previously using fields to use fieldsFromSource (Discover & Maps).
  • Updated "source filters" UI in index patterns to call them "field filters" as they are no longer exclusively applied to _source
  • Remove the usage of docvalue_fields for formatting date fields in favor of using the fields API, which doesn't have the same limits as docvalues. While the changes to the request format are handled internally, the response that comes back from ES will be different, so this is still a breaking change for apps that use search source and rely on the docvalue_fields from the response. At this point Discover is the only app that I'm aware of which does this. The maps app is also using docvalues, but AFAICT they are manually adding them to the request & not relying on searchsource to "magically" handle it for them.

Usage & Testing

  1. Check out this PR
  2. yarn start --run-examples & load some sample data
  3. Navigate to the example plugin: Developer examples > Search examples
  4. Specify desired query, filters, timerange, index pattern, & fields in the UI
  5. Click "Request from high-level client"

Open Questions

  1. Do we want to apply source filters to the fields API at all? (The name implies the filters only get applied to _source itself so I'm worried this is confusing?)
    • I applied them to the fields API too and renamed source filters to "Field filters" in the UI. We should discuss if this is something that we are all comfortable with. (Edit: Agreed this makes the most sense and is what most users would expect)
  2. Any other missing functionality that Discover would need to make this work? (So far the answer is "no")
  3. Should we make the fields API opt-in/out, or always use it under the hood?
    • Decided to always use the fields API, and you must opt out with "fieldsFromSource"

cc @majagrubic @kertal @timroes @ppisljar @stacey-gammon

Plugin API Changes

SearchSource now uses the search fields param by default

The data plugin's high-level search API, SearchSource, has migrated to use the Elasticsearch search fields param as the default when constructing a search request body with specific fields. In order to make it as easy as possible for plugins to migrate to the new behavior, we've preserved a way for plugins to use the legacy behavior of requesting fields from _source:

class MyPlugin {
  start(core, { data }) {
    const searchSource = data.search.searchSource.create();

    // Deprecated. Legacy behavior from 'fields' has been moved to 'fieldsFromSource'.
    // This is now the only way to search for fields directly from `_source`:
    searchSource.setField('fieldsFromSource', ['fieldA', 'fieldB']);

    // The old 'fields' syntax now uses the search fields API under the hood, and accepts
    // an array of fields that are passed straight through to the fields API.
    searchSource.setField('fields', ['fieldC', { field: 'fieldD', format: 'date_time' });

    ...etc
  }
}

If your plugin relies on calling setField('fields', [...]), you should update it to use fieldsFromSource until you are able to adapt your plugin to the new fields behavior.

SearchSource has stopped using docvalue_fields by default

Previously SearchSource would automatically request docvalue_fields for any date fields in an index pattern in order to avoid a situation where Kibana might receive a date field from Elasticsearch that it doesn't know how to format. With the introduction of the Elasticsearch search fields param which supports requesting fields in a particular format, we no longer need to rely on docvalue_fields for this behavior. SearchSource will now automatically request any date fields via the fields API, unless you provide specific ones via setField('fields', [...]), in which case only the relevant ones will be requested. If you do not provide a format for the fields you are requesting, one will automatically be added for you.

@lukeelmers lukeelmers added Feature:Search Querying infrastructure in Kibana WIP Work in progress Team:AppArch Project:RuntimeFields Project:Fields Access field values from fields response, not source labels Nov 2, 2020
@lukeelmers lukeelmers self-assigned this Nov 2, 2020
Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks the new method is very similar to the old one, could we rather cleanup original method and add this to it directly instead of having two separate functions, which look very different but behave very alike ?

also, i think ui setting is not the right way to configure this. We don't need this configuration per space or per user, we need the ability for solution developers to opt out of using fields API (for legacy/migration reasons). So i think we need to have an option on searchsource so you can request this on use-by-use basis. Also i am not sure if a general option (fetchFromSource) would work as we might need to select this for specific fields only. So i would vote for adding an option like fieldsFromSource to searchSource which would be array of field names i want to fetch from _source instead from fields

@lukeelmers
Copy link
Member Author

it looks the new method is very similar to the old one, could we rather cleanup original method and add this to it directly instead of having two separate functions, which look very different but behave very alike ?

Yeah agreed, for the purposes of this POC I wanted to avoid touching any of the current flatten logic so that we could clearly compare the two implementations (current & new one with Fields API). The idea would be that once we align on the right approach, these get combined into a single rewritten flatten method... This would definitely not be the final implementation.

also, i think ui setting is not the right way to configure this. We don't need this configuration per space or per user, we need the ability for solution developers to opt out of using fields API (for legacy/migration reasons). So i think we need to have an option on searchsource so you can request this on use-by-use basis.

++ Yeah also agreed that UI setting is not the way to go if we want folks to be able to opt in/out. I'm using it here basically as a feature flag for demo purposes only. I do not intend to actually merge this PR, but rather will open a new one (or rewrite this one) for the final implementation. Mainly this is to give the app team a way to play around with the changes in Discover so we can get feedback and make sure we are addressing any concerns.

I like the idea of defaulting to fields API and including an explicit fieldsFromSource list for folks who want to opt out... I'll look into that.

@lukeelmers
Copy link
Member Author

lukeelmers commented Nov 18, 2020

Okay, so here's my proposal for the final implementation on this (once we remove the uiSettings that's temporarily added as a feature flag). My goal was to figure out how we could do this so that the default experience with search source is the fields API, but anybody could still opt-out and use the legacy behavior of reading fields from source.

// 1. With no second argument, this will use fields API to retrieve fields that are stored/docvalue from
// IndexPattern.getComputedFields(). This isn't super useful anymore without an argument, since you
// won't get anything from _source by default:

searchSource.setField('fields');

// 2. Retrieves specified fields from fields API, extracts stored fields based on index pattern since they
// aren't supported by fields API, but otherwise doesn't do any magic. If you want formatted timestamp
// fields (like the ones we currently request via docvalues), you'd still need to explicitly list them here,
// however we can automatically inject a `format` if none is explicitly provided to ensure the returned
// format is something Kibana-friendly. In the example below, "time" will have format "test", but "@timestamp"
// will be formatted as we currently do with docvalues since it can be found in
// `indexPatterns.getComputedFields`:

searchSource.setField('fields', ['a', 'b', { field: 'time', format: 'test' }, '@timestamp']);

// 3. You can also still pass a function as is supported by the current search source implementation:

searchSource.setField('fields', (source) => ['a', 'b']);

// 4. This adds a new options object to `setField` which for now would only be used when setting the "fields".
// `fieldsFromSource` would request whatever you pass directly from source instead of fields API:

searchSource.setField('fields', ['a', 'b'], { fieldsFromSource: ['c'] });

// 5. Or you could use `['*']` to completely enable source and replicate the legacy behavior.
// So this is what you'd have to do to completely "opt-out" of the new fields API, otherwise
// you'll get the new fields API by default:

searchSource.setField('fields', [], { fieldsFromSource: ['*'] });

So TLDR on the above: The behavior changes behind the scenes to use fields API by default, but you can opt-out using the new option in example 4 & 5.


Some pros & cons to this approach:

Pros

  • Fine grained control at the field-level of what comes from source vs fields API
  • No breaking changes to how you initiate a request with Search Source
    • Edit: Technically this is still breaking in that you will get a different response back, however this is unavoidable unless we want to make fields API opt-in instead of opt-out. Regardless, my plan is to update all downstream uses of SearchSource in the PR to ensure things aren't breaking.
  • Less magic as there's a clear line between what comes

Cons

  • Apps need to do the extra work of requesting any needed time fields since they won't always get them all back like we did previously with docvalues
    • This is sort of a Pro too, since it means a more efficient query
  • Syntax for opting-out is a little clunky. It was hard to figure out how to do this elegantly in a non-breaking minimally-breaking way; open to suggestions.

Open to any feedback folks have / cc @ppisljar @majagrubic @timroes @stacey-gammon

My plan is to move forward with this design starting in the next few days so we can get this in and unblock the rest of the Discover work, but please do chime in if you find any red flags.

@ppisljar
Copy link
Member

the proposal looks good to me. the only thing i am wondering is if we should rather have
searchSource.setField('fieldsFromSource', [...]) instead of that third option paramater on the setField('fields' but no strong opinion about it.

@lukeelmers
Copy link
Member Author

the only thing i am wondering is if we should rather have
searchSource.setField('fieldsFromSource', [...]) instead of that third option paramater on the setField('fields' but no strong opinion about it.

Well this would certainly be easier from an implementation perspective, so I have no strong opinion on that if that seems clearer to folks.

In that case the existing behavior of setField('fields') would simply change to the fields API, and you'd need to do setField('fieldsFromSource') to opt-out and go back to the old (current) behavior.

Comment on lines +39 to +41
defaultMessage="Field filters can be used to exclude one or more fields when fetching a document. This happens when
viewing a document in the Discover app, or with a table displaying results from a saved search in the Dashboard app.
If you have documents with large or unimportant fields you may benefit from filtering those out at this lower level."
Copy link
Member Author

@lukeelmers lukeelmers Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkime @timroes @ppisljar @majagrubic

We need to discuss if we are comfortable with this change. Also there is a note a few lines below this (L47-48) that reads:

"Note that multi-fields will incorrectly appear as matches in the table below. These filters only actually apply to fields in the original source document, so matching multi-fields are not actually being filtered."

However, multi-fields will be filtered now for fields provided to .setField('fields') as this filtering happens on the client. So it's just .setField('fieldsFromSource') where they are not filtered.

Any ideas on how we should address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Bumping this -- does anybody have any concerns around changing this wording? I don't want this change to slip in unnoticed in case folks have ideas of better ways to handle this.

cc @elastic-jb As this represents a slight change to how the index patterns "source filters" feature is presented to end users

@lukeelmers lukeelmers changed the title [POC] SearchSource using Fields API. [data.search.searchSource] Update SearchSource to use Fields API. Nov 25, 2020
@lukeelmers
Copy link
Member Author

I went the path that @ppisljar suggested above and moved the legacy behavior to setField('fieldsFromSource') instead.

So the final API looks like this:

searchSource.setField('fields', [...stuff to request from fields API]);
searchSource.setField('fieldsFromSource', [...stuff to request from _source]);

Still a few more tweaks that need to happen, but I think we're pretty close.

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@lukeelmers lukeelmers requested a review from a team as a code owner November 30, 2020 19:50
@@ -32,7 +32,7 @@ export default function ({ getPageObjects, getService }) {

//Source should be correct
expect(mapboxStyle.sources[MB_VECTOR_SOURCE_ID].tiles[0]).to.equal(
"/api/maps/mvt/getGridTile?x={x}&y={y}&z={z}&geometryFieldName=geo.coordinates&index=logstash-*&requestBody=(_source:(excludes:!()),aggs:(gridSplit:(aggs:(gridCentroid:(geo_centroid:(field:geo.coordinates)),max_of_bytes:(max:(field:bytes))),geotile_grid:(bounds:!n,field:geo.coordinates,precision:!n,shard_size:65535,size:65535))),docvalue_fields:!((field:'@timestamp',format:date_time),(field:'relatedContent.article:modified_time',format:date_time),(field:'relatedContent.article:published_time',format:date_time),(field:utc_time,format:date_time)),query:(bool:(filter:!((match_all:()),(range:('@timestamp':(format:strict_date_optional_time,gte:'2015-09-20T00:00:00.000Z',lte:'2015-09-20T01:00:00.000Z')))),must:!(),must_not:!(),should:!())),script_fields:(hour_of_day:(script:(lang:painless,source:'doc[!'@timestamp!'].value.getHour()'))),size:0,stored_fields:!('*'))&requestType=grid&geoFieldType=geo_point"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functional test was failing because the generated requestBody is slightly different now -- even when using fieldsFromSource to mimic the legacy behavior, docvalue_fields is no longer used by default in search source, unless you are explicitly setting it. Instead everything runs through fields.

I'm not sure if this represents a breaking change for the maps app, please LMK if you see any issues.

Copy link
Contributor

@majagrubic majagrubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API changes looks good to me, it's in line with what we have discussed previously. But just one thing - why removing the switch from advanced settings? I'm relying on this switch in Discover - in the first iteration we definitely want to keep this, as a fallback mechanism to old behavior. We can keep it out of this PR and I'll add it later.

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@lukeelmers
Copy link
Member Author

lukeelmers commented Dec 1, 2020

why removing the switch from advanced settings? I'm relying on this switch in Discover - in the first iteration we definitely want to keep this, as a fallback mechanism to old behavior.

@majagrubic The ui setting represented a global config that would affect all apps, and one of the reasons we went the route of going with two different APIs -- fields and the fieldsFromSource legacy behavior -- was so that apps could gradually adopt the changes and it wouldn't be an all-or-nothing proposition.

For example, if we kept the global setting, the maps app would simply break when the new API is toggled on, unless they did the work to refactor the app code, which would require more coordination in order to launch support for fields API.

We can keep it out of this PR and I'll add it later.

++ Yeah my suggestion would be to add a ui setting that's specifically for Discover, which allows for enabling retrieval of fields from _source. Then in the Discover code you could check for that flag and conditionally call fields or fieldsFromSource. This way we avoid having a global search source setting, and instead have an app-specific one. Then when other apps like maps migrate, they can decide whether they need a ui setting or want to make the change all at once.

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@majagrubic majagrubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukeelmers totally makes sense regarding the option switch.
I have tested this PR by building Discover-related functionality on top of it and all works fine there 👍

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

Update for those following along -- we are waiting for a regression to be fixed in Maps before merging this, as they both touch the same area of code. Once #84806 lands, I'll rebase this PR so we can get it merged.

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maps changes tested and works as expected 🎉 thx for patience on #84806

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

As there have been no concerns raised in the thread about renaming index pattern "source filters" to "field filters", I'll go ahead and merge this.

Thanks all! ❤️

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
data 602 601 -1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
discover 432.2KB 432.2KB +10.0B
indexPatternManagement 604.1KB 604.0KB -75.0B
maps 2.8MB 2.8MB +30.0B
total -35.0B

Distributable file count

id before after diff
default 43460 43459 -1
oss 22744 22743 -1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 983.3KB 984.3KB +1013.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Search Querying infrastructure in Kibana Project:Fields Access field values from fields response, not source Project:RuntimeFields release_note:plugin_api_changes Contains a Plugin API changes section for the breaking plugin API changes section. v7.11.0 v8.0.0
Projects
None yet
7 participants