[Research] bsearch investigation #166206

thomasneirynck · 2023-09-11T21:09:54Z

The kibana Server internal bsearch API is foundational to the functioning of dashboards.

Architecture

bsearch collects various Elasticsearch aggregation request in the browser on a debounce schedule
Kibana Browser issues these as a single request to kibana server.
Kibana server fans these out to Elasticsearch as individual requests to Elasticsearch.
Kibana Server then receives the responses. Kibana Servers serializes these into base64, concatenates them, and returns them as a single multi-line text file.
Kibana Browser then decodes the response.

e.g. rough schematic

Purpose

avoid browser connection limit of http1
Depending on the search-strategy, long running calls are issues as _async_search calls . This enables queries from the Dashboards to be run as background sessions.

Areas for improvement

bsearch resolves key constraints, but also introduces new ones. Primarily, it increased pressure on Kibana Server.

Batching inbsearch is primarily work-around for http1 limitations. http2 support of Kibana Server/Cloud infra would clear this hurdle (Http2 support for the Kibana server #7104)
the crufty response format puts pressure on both Kibana Server and Browser
- on the server, Kibana must wait on ES-reponses, serialize to base64, and concatenate each response
- on the client, Kibana must re-inflate. It does so in two steps. Decoding of the base64 strings, followed by unmarshaling in a JSON object.
- This two-step string encoding/decoding prevents more efficient streaming mechanisms. It also prevents relying on the built-in gzip compression of Elasticsearch. e.g. some kibana endpoints just stream data straight from Elasticsearch to the Browser (e.g. the maps/mvt endpoints).
it does not leverage optimal querying-strategies for Elasticsearch. More optimal querying strategies would consist of optimizing the queries into a single request, rather than fan-out into separate requests. (note that this requires bsearch to be aware of the semantics of the requests. ie. this would only really work with aggregations).

Goals

Consider:

Move to http2 and re-evaluate batching/re-encoding requirements of bsearch
Investigate whether bsearch can be "smarter" in its collection of queries. The vast majority of bsearch calls from Dashboards are aggregations and could be more efficiently run with a single msearch or search query that combines the aggs in a single definition.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-09-11T21:10:16Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

elasticmachine · 2023-09-11T21:10:16Z

Pinging @elastic/kibana-presentation (Team:Presentation)

Dosant · 2024-02-23T09:49:34Z

Linking this relevant investigation. Could be helpfull

thomasneirynck · 2024-03-08T14:28:28Z

Query-consolidation is an unexplored area, and could benefit from the existing architecture.

vadimkibana · 2024-03-26T15:36:31Z

A few notes, in case it might help:

bfetch batches client requests, I think, by default, up until 25 requests or until 10ms elapse, whichever is first.
The responses are not concatenated back into a text file and sent as one response, but they are streamed as soon as each becomes available as NDJSON (new line delimited JSON). Using the Content-Encoding: chunked HTTP header and ability to listen for new chunks in the browser using some less known XHR request APIs.
bfetch response stream can also compress each message, in which case it then encodes each line as Base64 text (instead of JSON).
- This compression is not good, as it does not use native HTTP compression, nor browser decompression mechanisms, instead, it is custom compressed in Node.js and encoded as Base64 and then there is a custom de-compression code bundled to the browser, which decodes Base64 and then decompresses.
An important use case is the ability to stream back the response, it can be configure to stream back an infinite sequence of responses. Observability solution started using it as a WebSocket of sorts. Where they open a long-living connection which pushes from the server status information [Synthetics] stream results back for project monitors #138069 (EDIT: see [Research] bsearch investigation #166206 (comment))
To make streaming of small messages work, we had to do the below patches. Before the Cloud Proxy used to buffer HTTP responses up to 4KB, which was changed to allow the Cloud Proxy to pass through any size message immediately.
- https://github.com/elastic/cloud/pull/106440
- [bfetch] set 'X-Accel-Buffering':'no' to streaming response headers #139534
There is a new steaming mechanism used in Kibana, called "response_stream". It used to be a plugin, now a package. It uses native HTTP compression on the server and native browser APIs for decompression.

thomasneirynck · 2024-03-27T18:32:29Z

thx @vadimkibana!

wrt synthetics use-case, @dominiqueclarke just informed this usage was removed in 8.10.

thomasneirynck · 2024-04-29T04:39:59Z

Consider turning bsearch off in just Serverless #181938

thomasneirynck · 2024-05-20T22:22:10Z

With #179663, we have been collecting more telemetry on the overhead of bsearch, specifically the custom encoding part into the line-delimited base64 format.

Metrics:

Long-tail distribution of time spent per single call.

75 percentile sits under 50-60 ms per bsearch call.

Given that a single dashboard will have typically 5-6 bsearch calls to fetch data for all charts, we can expect 10s to 100s of milliseconds spent on a single dashboard, just re-encoding the data per single time2data cycle.

Long-tail distribution of total message size

time spent scales linearly with message size

Message size and encoding time scales linearly (duh).

Evidence of really large responses.

At the end of the long tail (+95percentile), we find evidence of really large responses (in the order of megabytes)

Takeway

Removing time spent re-encoding data in bsearch should be a broad but shallow improvement to overall time. We should expect it to compound positively as well, given the single-threaded nature of nodejs. While we have no metrics on that, given the evidence of large data-responses, removal of this encoding should also reduce memory pressure on the kibana-server at runtime.

Overall, removal will help work towards "thinning" the kibana server footprint, and should yield measurable improvements to time2data (providing kibana-server supports http2 parallelization).

kertal · 2024-07-10T09:17:26Z

qq: is the research part done? can this be closed?

thomasneirynck · 2024-07-12T13:54:19Z

yes, let's close.

thomasneirynck added the Meta label Sep 11, 2023

botelastic bot added the needs-team Issues missing a team label label Sep 11, 2023

thomasneirynck added Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Sep 11, 2023

botelastic bot removed the needs-team Issues missing a team label label Sep 11, 2023

thomasneirynck mentioned this issue Sep 11, 2023

[META] Dashboard Performance #166211

Closed

19 tasks

thomasneirynck added the research label Jan 26, 2024

thomasneirynck assigned lukasolson Jan 26, 2024

thomasneirynck changed the title ~~[META] bsearch constraints~~ [META] bsearch investigation Jan 26, 2024

thomasneirynck mentioned this issue Jan 26, 2024

[Research] Query consolidation on Dashboards #175702

Open

thomasneirynck mentioned this issue Feb 14, 2024

[Research] Kibana Expression Language impact to Lens charts performance #175700

Closed

thomasneirynck changed the title ~~[META] bsearch investigation~~ [Research] bsearch investigation Mar 8, 2024

thomasneirynck mentioned this issue Mar 8, 2024

[RESEARCH] Cache data requests in the browser #178323

Closed

thomasneirynck mentioned this issue Mar 15, 2024

Http2 support for the Kibana server #7104

Closed

thomasneirynck mentioned this issue May 20, 2024

[TELEMETRY][BSEARCH] Measure encoding overhead #179663

Merged

1 task

lukasolson mentioned this issue Jun 12, 2024

[data.search] Remove bfetch/bsearch #186139

Closed

thomasneirynck closed this as completed Jul 12, 2024

thomasneirynck mentioned this issue Aug 2, 2024

async ESQL and _search: add search_id, is_running and is_complete to response headers elastic/elasticsearch#109576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] bsearch investigation #166206

[Research] bsearch investigation #166206

thomasneirynck commented Sep 11, 2023 •

edited

Loading

elasticmachine commented Sep 11, 2023

elasticmachine commented Sep 11, 2023

Dosant commented Feb 23, 2024

thomasneirynck commented Mar 8, 2024

vadimkibana commented Mar 26, 2024 •

edited by thomasneirynck

Loading

thomasneirynck commented Mar 27, 2024

thomasneirynck commented Apr 29, 2024

thomasneirynck commented May 20, 2024

kertal commented Jul 10, 2024

thomasneirynck commented Jul 12, 2024

[Research] bsearch investigation #166206

[Research] bsearch investigation #166206

Comments

thomasneirynck commented Sep 11, 2023 • edited Loading

elasticmachine commented Sep 11, 2023

elasticmachine commented Sep 11, 2023

Dosant commented Feb 23, 2024

thomasneirynck commented Mar 8, 2024

vadimkibana commented Mar 26, 2024 • edited by thomasneirynck Loading

thomasneirynck commented Mar 27, 2024

thomasneirynck commented Apr 29, 2024

thomasneirynck commented May 20, 2024

Metrics:

Long-tail distribution of time spent per single call.

Long-tail distribution of total message size

time spent scales linearly with message size

Evidence of really large responses.

Takeway

kertal commented Jul 10, 2024

thomasneirynck commented Jul 12, 2024

thomasneirynck commented Sep 11, 2023 •

edited

Loading

vadimkibana commented Mar 26, 2024 •

edited by thomasneirynck

Loading