Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research] bsearch investigation #166206

Closed
Tracked by #166211
thomasneirynck opened this issue Sep 11, 2023 · 10 comments
Closed
Tracked by #166211

[Research] bsearch investigation #166206

thomasneirynck opened this issue Sep 11, 2023 · 10 comments
Assignees
Labels
Meta research Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas

Comments

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Sep 11, 2023

The kibana Server internal bsearch API is foundational to the functioning of dashboards.

Architecture

  • bsearch collects various Elasticsearch aggregation request in the browser on a debounce schedule
  • Kibana Browser issues these as a single request to kibana server.
  • Kibana server fans these out to Elasticsearch as individual requests to Elasticsearch.
  • Kibana Server then receives the responses. Kibana Servers serializes these into base64, concatenates them, and returns them as a single multi-line text file.
  • Kibana Browser then decodes the response.

e.g. rough schematic
image

Purpose

  • avoid browser connection limit of http1
  • Depending on the search-strategy, long running calls are issues as _async_search calls . This enables queries from the Dashboards to be run as background sessions.

Areas for improvement

bsearch resolves key constraints, but also introduces new ones. Primarily, it increased pressure on Kibana Server.

  • Batching inbsearch is primarily work-around for http1 limitations. http2 support of Kibana Server/Cloud infra would clear this hurdle (Http2 support for the Kibana server #7104)
  • the crufty response format puts pressure on both Kibana Server and Browser
    • on the server, Kibana must wait on ES-reponses, serialize to base64, and concatenate each response
    • on the client, Kibana must re-inflate. It does so in two steps. Decoding of the base64 strings, followed by unmarshaling in a JSON object.
    • This two-step string encoding/decoding prevents more efficient streaming mechanisms. It also prevents relying on the built-in gzip compression of Elasticsearch. e.g. some kibana endpoints just stream data straight from Elasticsearch to the Browser (e.g. the maps/mvt endpoints).
  • it does not leverage optimal querying-strategies for Elasticsearch. More optimal querying strategies would consist of optimizing the queries into a single request, rather than fan-out into separate requests. (note that this requires bsearch to be aware of the semantics of the requests. ie. this would only really work with aggregations).

Goals

Consider:

  • Move to http2 and re-evaluate batching/re-encoding requirements of bsearch
  • Investigate whether bsearch can be "smarter" in its collection of queries. The vast majority of bsearch calls from Dashboards are aggregations and could be more efficiently run with a single msearch or search query that combines the aggs in a single definition.
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 11, 2023
@thomasneirynck thomasneirynck added Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Sep 11, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-presentation (Team:Presentation)

@Dosant
Copy link
Contributor

Dosant commented Feb 23, 2024

Linking this relevant investigation. Could be helpfull

@thomasneirynck thomasneirynck changed the title [META] bsearch investigation [Research] bsearch investigation Mar 8, 2024
@thomasneirynck
Copy link
Contributor Author

Query-consolidation is an unexplored area, and could benefit from the existing architecture.

@vadimkibana
Copy link
Contributor

vadimkibana commented Mar 26, 2024

A few notes, in case it might help:

  • bfetch batches client requests, I think, by default, up until 25 requests or until 10ms elapse, whichever is first.
  • The responses are not concatenated back into a text file and sent as one response, but they are streamed as soon as each becomes available as NDJSON (new line delimited JSON). Using the Content-Encoding: chunked HTTP header and ability to listen for new chunks in the browser using some less known XHR request APIs.
  • bfetch response stream can also compress each message, in which case it then encodes each line as Base64 text (instead of JSON).
    • This compression is not good, as it does not use native HTTP compression, nor browser decompression mechanisms, instead, it is custom compressed in Node.js and encoded as Base64 and then there is a custom de-compression code bundled to the browser, which decodes Base64 and then decompresses.
  • An important use case is the ability to stream back the response, it can be configure to stream back an infinite sequence of responses. Observability solution started using it as a WebSocket of sorts. Where they open a long-living connection which pushes from the server status information [Synthetics] stream results back for project monitors #138069 (EDIT: see [Research] bsearch investigation #166206 (comment))
  • To make streaming of small messages work, we had to do the below patches. Before the Cloud Proxy used to buffer HTTP responses up to 4KB, which was changed to allow the Cloud Proxy to pass through any size message immediately.
  • There is a new steaming mechanism used in Kibana, called "response_stream". It used to be a plugin, now a package. It uses native HTTP compression on the server and native browser APIs for decompression.

@thomasneirynck
Copy link
Contributor Author

thx @vadimkibana!

wrt synthetics use-case, @dominiqueclarke just informed this usage was removed in 8.10.

@thomasneirynck
Copy link
Contributor Author

Consider turning bsearch off in just Serverless #181938

@thomasneirynck
Copy link
Contributor Author

With #179663, we have been collecting more telemetry on the overhead of bsearch, specifically the custom encoding part into the line-delimited base64 format.

Metrics:

Long-tail distribution of time spent per single call.

75 percentile sits under 50-60 ms per bsearch call.

image

Given that a single dashboard will have typically 5-6 bsearch calls to fetch data for all charts, we can expect 10s to 100s of milliseconds spent on a single dashboard, just re-encoding the data per single time2data cycle.

Long-tail distribution of total message size

image

time spent scales linearly with message size

Message size and encoding time scales linearly (duh).

Evidence of really large responses.

At the end of the long tail (+95percentile), we find evidence of really large responses (in the order of megabytes)

image

Takeway

Removing time spent re-encoding data in bsearch should be a broad but shallow improvement to overall time. We should expect it to compound positively as well, given the single-threaded nature of nodejs. While we have no metrics on that, given the evidence of large data-responses, removal of this encoding should also reduce memory pressure on the kibana-server at runtime.

Overall, removal will help work towards "thinning" the kibana server footprint, and should yield measurable improvements to time2data (providing kibana-server supports http2 parallelization).

@kertal
Copy link
Member

kertal commented Jul 10, 2024

qq: is the research part done? can this be closed?

@thomasneirynck
Copy link
Contributor Author

yes, let's close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta research Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas
Projects
None yet
Development

No branches or pull requests

6 participants