Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async ESQL and _search: add search_id, is_running and is_complete to response headers #109576

Closed
ppisljar opened this issue Jun 11, 2024 · 14 comments · Fixed by #112431
Closed

async ESQL and _search: add search_id, is_running and is_complete to response headers #109576

ppisljar opened this issue Jun 11, 2024 · 14 comments · Fixed by #112431
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement :Search Foundations/Search Catch all for Search Foundations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@ppisljar
Copy link
Member

Description

In order to be able to use cbor response format in kibana and avoid decoding the body on kibana server, we would need search_id, is_running and is_complete to be returned as part of response headers (on top of returning them in body as today)

@ppisljar ppisljar added >enhancement needs:triage Requires assignment of a team area label labels Jun 11, 2024
@nik9000
Copy link
Member

nik9000 commented Jun 11, 2024

@ppisljar and I discussed and alternative, which is always sending wait_for_completion=0 and then using that to get the id. That felt less good to me because it throws out the wait-for-completion behavior. And I'm not entirely sure that it'd work - I think we can sometimes return with the results immediately from wait_for_completion=0. It just feels better to do the headers.

@nik9000 nik9000 added :Search/Search Search-related issues that do not fall into other categories :Analytics/ES|QL AKA ESQL labels Jun 11, 2024
@nik9000
Copy link
Member

nik9000 commented Jun 11, 2024

I've tagged this as search/search and esql because we probably want to do this in both places.

@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jun 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@nik9000
Copy link
Member

nik9000 commented Jul 29, 2024

Technically I think all you need is is_completed. If it's false you can parse and get the search id. If it's true you can stream. But if we're going to add the rest I think.

@nik9000
Copy link
Member

nik9000 commented Jul 29, 2024

All I see in the responses is

            builder.field("id", id);
            builder.field("is_running", isRunning);
            builder.field("is_partial", isPartial);

Is there an is_completed you have that you can reference? Or are you looking for is_running?

@thomasneirynck
Copy link
Contributor

Adding some more context here around the relevance of this issue for Kibana.

Kibana-server introduces overhead when querying Elasticsearch. This is due to how the implementation in the data-plugin. This plugin is used by many applications (notably Discover, Lens, Maps, ...) to fetch data from Elasticsearch. The overhead falls in two buckets.

  1. a batching layer (bsearch) that collates requests from the browser into a single one i([Research] bsearch investigation kibana#166206). This was introduced as a work-around for the http1-connection limit in browsers. It collates the ES-responses into a single response (see (2)). Apart from having to collate the responses, the custom batching also introduces more latency.
  2. unnecessary encoding/decoding and custom compression. ES-requests are decoded and re-encoded on kibana-server before sending it to kibana-browser.

(1) can be addressed by moving to http2. This would mainly be an architecture improvement, with some marginal performance benefits.
(2) when (1) is completed, any unneccessary decoding/encoding can be removed as well. This will result in faster delivery of data to kibana-browser and reduce the memory footprint in kibana-server.

(2) is blocked by this issue here (#109576). There are a few use-cases in polling for async-responses where some of metadata is needed from theES-response body. Can this be moved to a header (exact properties TBD).


Running kibana by default in http2 and removing bsearch (1) is an incompatible change as it removes the bsearch component and may potentially degrade on-prem users running http1. To motivate this move, and the additional admin-requirements of using SSL, we would also like to introduce the performance benefit of (2) at the same time.


wrt cbor and streaming. Removing any decoding requirements from kibana-server, will allow longer-term improvements like using a more efficient data-format (cbor), or even faster delivery (stream results). However, these are longer-term improvements and not the most critical from a short-term perspective.

@nik9000 nik9000 changed the title [async search] add search_id, is_running and is_complete to response headers async ESQL and _search: add search_id, is_running and is_complete to response headers Aug 7, 2024
@ivancea ivancea self-assigned this Aug 13, 2024
@swallez
Copy link
Member

swallez commented Aug 20, 2024

Regarding "use cbor response format in kibana": starting with 8.15, ES|QL results can be output as Apache Arrow dataframes which are generally of the same size or smaller than cbor encoding and can be used with no decoding/deserialization.

This was added in #109873

nik9000 pushed a commit that referenced this issue Aug 21, 2024
Add headers to async ESQL queries to show the status and query ID without having to parse the body.

ESQL part of #109576
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Sep 4, 2024
…c#111840)

Add headers to async ESQL queries to show the status and query ID without having to parse the body.

ESQL part of elastic#109576
@ivancea ivancea closed this as completed in d59df8a Sep 5, 2024
@ivancea
Copy link
Contributor

ivancea commented Sep 5, 2024

Just merged the last PR adding headers.
Now both the /_query and /_async_search[/status] also have these headers in the response:

  • X-Elasticsearch-Async-Is-Running: Either ?0 or ?1, depending on if the query is still running or not
  • X-Elasticsearch-Async-Id: The ID of the query. If the query was instantly ran and the other header is ?1, this header may be absent

davidkyle pushed a commit to davidkyle/elasticsearch that referenced this issue Sep 5, 2024
…c#111840)

Add headers to async ESQL queries to show the status and query ID without having to parse the body.

ESQL part of elastic#109576
@thomasneirynck
Copy link
Contributor

thomasneirynck commented Sep 19, 2024

thx @swallez - wrt #109576 (comment)

We have done some investigation in arrow and we noticed that the client-side support in Javascript is quite poor right now, and any gains in filesize, Kibana loses in parsing/reading of the format (elastic/kibana#183909).

So there is no immediate plan to use arrow as transfer format for Kibana.

@swallez
Copy link
Member

swallez commented Sep 20, 2024

Thanks for the feedback @thomasneirynck. I'll look at that PR to understand if/how we can avoid any form of parsing, which is an essential part of the benefits of Arrow beyond network payload size.

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Sep 20, 2024

I should clarify we mainly looked at it from a dashboard/charting perspective.

if/how we can avoid any form of parsing

Much has to do with how Lens and elastic/charts require their data to be laid out a specific way before data is drawn as a chart. Different models (e.g. loading arrow-buffers straight on GPU) could avoid all this, but there are not actionable in the short term.

More research is needed though, and especially when we're thinking about ES|QL-charts, we may be able to take shortcuts.

@swallez
Copy link
Member

swallez commented Sep 30, 2024

I did some investigation an posted the results here elastic/kibana#175695 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement :Search Foundations/Search Catch all for Search Foundations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants