Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using partial results in Discover #76307

Closed
lizozom opened this issue Aug 31, 2020 · 13 comments
Closed

Using partial results in Discover #76307

lizozom opened this issue Aug 31, 2020 · 13 comments
Labels
blocked Feature:Discover Discover Application Feature:elasticsearch Feature:Search Querying infrastructure in Kibana Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@lizozom
Copy link
Contributor

lizozom commented Aug 31, 2020

In x-pack, the data.search service receives partial results returned from Elasticsearch'es _async_search endpoint.
However, this capability is not exposed by SearchSource, as we wait until the final result is received, before returning it.

While using partial results in most visualizations requires significant work on expressions, we could still use partial results in discover, maps, TSVB and Timelion relatively easily, by consuming the Observable returned from data.search.search and also making the fetch$ method of SearchSource public.

We could attempt a POC on Discover, to demonstrate these capabilities, once the Discover search query is split into two (#69134, #55975).

It is important to note that making this change would however mean that msearch would not work for those solutions.

@lizozom lizozom added Feature:Search Querying infrastructure in Kibana Team:AppArch labels Aug 31, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@AlonaNadler
Copy link

we could still use partial results in discover

Wow this could be amazing !! especially in Discover

@lizozom lizozom changed the title Using partial results in Kibana Using partial results in Discover Oct 26, 2020
@lizozom
Copy link
Contributor Author

lizozom commented Oct 27, 2020

@timroes had mentioned that due to upcoming changes in Discover this task will become irrelevant.
I'll let him elaborate, but in the meanwhile I'm closing the issue.

@lizozom lizozom closed this as completed Oct 27, 2020
@timroes
Copy link
Contributor

timroes commented Oct 27, 2020

After splitting up the query in Discover we'll have 3 queries there:

  • Loading documents (without exact hit count, sorted by discover sort order)
  • Loading exact hit count (only loading exact count nothing else)
  • Loading data to display in the visualization (in case it's a time based index)

The last query is the one where it would make most sense to use partial data. During the refactoring to split those queries (and get rid of some cyclic dependencies) we'll use esaggs to load the data for the chart. Thus there is no separate task here than loading partial data via expressions in general.

The second query could potentially use it, thought since it simply loading the count of matches, and we already with the first (fast) query show the estimated count, it would not create much more than a "counter animation" of the hits, so you'd first see "> 50,000", "51,230" , "55,000" instead of jumping from "> 50,000" to "55,000" directly. Also from my current understanding of ES performance that query wouldn't even take significantly longer, so it's not really a benefit, over complexifying the architecture for this (if we simply want a counter animation, we could simply randomly increase that number in JS :D)

The request to load the documents should (that's the whole purpose of those splitting) be significantly fast since it simply loads documents. In the case where partial documents would benefit us (because we know they are loading at the end of the list), like in the time based use-case, this query would not depend at the size of the data you have in ES, and always return extremly fast, so there is no benefit showing partial results here (we'd most likely not ever run into the case where it would be slow enough to work with partial results).

@AlonaNadler
Copy link

How about partial loading for the documents? that is the use case our users will need the most, also the one that our competitors come up often in. Often when users need to do very long searches for bad actors they might search for long time range and can potentially wait hours for the query to return. In these searches, there is a lot of benefits to partial results

@timroes
Copy link
Contributor

timroes commented Oct 29, 2020

For the document loading query (1 above) we could potentially enable it, though we need to find a way not to have the users content jump, i.e. not automatically showing new results incoming, since - so far my understanding - we're not getting a guarantee that the partial results are coming in in the requested sort order (i.e. the 2nd partial results, could fill in documents randomly within the previously ones loaded). Thus we could have a mechanis, that informs the user that new data is there, and than clicks it to refresh. Automatically updating only makes sense as long as we can be sure, we're not suddenly removing documents the user is interacting with atm, which would only work if we have a guarantee that partial documents are arriving in the requested sort order step by step (which from my last syncs with ES is not the case). I'll reopen this so we can discuss further details on how we could implement such a behavior in discover safely.

@timroes timroes reopened this Oct 29, 2020
@timroes timroes added Feature:Discover Discover Application Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Oct 29, 2020
@AlonaNadler
Copy link

@timroes I agree with you that ideally, we want the results in some sort order. However, I think it is worthwhile even if it doesn't show up in a specific order as a first step. We can then see how much users are bothered by the lack of order and add it in the future.
This capability will be useful in queries that run for a long time (few minutes- few hours) and let users get some of the results as they stream in

@lizozom
Copy link
Contributor Author

lizozom commented Nov 3, 2020

@timroes @AlonaNadler
Loading the documents themselves should be very fast, shouldn't it, as it's only a small chunk of documents each time?

Update

Just synced with @jimczi, async_search doesn't support partial results for top hits latest documents, only for aggregations. So doing partial results won't be possible. Anyway, once top hits are fetched separately from the aggs, they should return much much faster.

If you are ok with in, I do think that this issue can be closed.

@AlonaNadler
Copy link

I think the main advantage in having partial results in Discover is having it for the results of the raw documents (in the red square):
image

Mainly since it will allow users who have long queries to see intermediate results while they wait instead of waiting for the query to complete.
Partial results for the histogram in Discover is nice to have but the main feature when it comes to make it slow in Discover is to stream ` the documents and view the results while the query still in progress.

@lizozom as far as I can tell, and @timroes knows better. Discover doesn't use Top hits for the raw documents results

@timroes
Copy link
Contributor

timroes commented Nov 3, 2020

Clarified with Liza, that "Top hits" in this case was indeed referring to the "top search results" we're using and not the "top hits" aggregation. Since Elasticsearch does not support partial results on those - and as confirmed by Jim also this query will be super fast once we split it up - there is no place in Discover partial results would still make sense. I am closing this.

If we have the feeling this is a justified use-case, please open an issue in the elasticsearch repository for adding partial results to search results (hits). If ES will agree on building them, we can reopen this issue for tracking again, but for the sake of keeping the issue amount manageable I'd close this for now, since there is currently no Kibana work in this.

@AlonaNadler
Copy link

AlonaNadler commented Nov 3, 2020

@elastic-jb @lizozom can you open an elasticsearch issue on what Kibana needs to support intermediate results in discover results? and link it here

@lizozom
Copy link
Contributor Author

lizozom commented Nov 4, 2020

Please take at a look at the benchmarks I did on different types of queries.

It shows that once the query is split, fetching the latest documents is going to be at least x10 faster than it is today, as what restricts the performance of Discover today is loading the aggs and latest documents in the same query.

Therefore, and @jimczi and @giladgal mentioned this before, there's no significant performance benefit in adding partial results support to fetching latest documents, as long as those two things are fetched separately.

@lizozom lizozom closed this as completed Dec 9, 2020
@lizozom
Copy link
Contributor Author

lizozom commented Dec 9, 2020

Closing as there is significant work on partial results and splitting out the queries on Discover.
This PR is still irrelevant at the moment. WIll reopen if relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Feature:Discover Discover Application Feature:elasticsearch Feature:Search Querying infrastructure in Kibana Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants