[Core] Kibana discovery service #188177

pgayvallet · 2024-07-12T07:55:50Z

Supersede/replace #93029

In #187696, response ops is planning to implement a discovery service for task manager, for perf / workload optimizations. That implementation is planned to remain internal to task manager.

However, we agreed that having such feature implemented as a Core service could potentially make sense, as taskManager workload balance is not the only feature that could benefit from accessing information about an appromixative state of the "Kibana cluster.

I open this issue to discuss about what this "discovery system" could look like as a Core service, and confirm other features would benefit from it:

List the features/consumers that could benefit from it to evaluate the concrete need for such service at Core's level
Given 1., see how much we would need to adapt / diverge from the implementation used by response-ops
Start thinking about what the API surface could look like for such service
1. how to consume it
2. would it be possible for API consumers to "enhance" it (so add more info to the "node status")

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-07-12T07:55:52Z

Pinging @elastic/kibana-core (Team:Core)

pgayvallet · 2024-07-12T08:16:25Z

Starting the discussions by sharing my (unstructured) thoughts:

The first use case coming to mind that could benefit from this feature is around Kibana status and/or diagnostic:

The status API
The kibana diagnostic tools

Especially for the status API, if we were to store the status of each node in their "cluster state" (name tbd) data, we could finally have a status API that returns the overall status of our Kibana cluster, and not only the status of the node we are querying. Not sure if it really would add value for orchestrated environment where the status aggregation is done at a higher level, it would probably at least make sense for on-prem deployments (but as said, it could also potentially bring more value to all envs)

Then, the second use case would be multi-stage deployment.I don't have a great example there, but I feel like having each node being able to know if the version used across all live nodes of the cluster is the same could be useful for the purpose of "automated" behavioral changes of a multi-stage deployment.

rudolf · 2024-07-12T11:57:14Z

We've gotten very far in scaling saved object migrations by just transforming less documents. But we haven't fundamentally increased migration throughput. The bottleneck there is the time it takes to load one batch, transform the documents and write it back. This causes a lot of dead time because of the waiting on read and writes. We can't parallelise this or increase batch sizes much because then Kibana runs out of RAM. To improve this we'd want to shard the migration work across the Kibana nodes. This way each Kibana node would only transform a subset of documents increasing transform/CPU throughput. And we'd parallelise the read,transform,write loop better utilising Elasticsearch and minimising the cost of network latency.

we could finally have a status API that returns the overall status of our Kibana cluster,

+100 cluster-wide status could be very useful

pgayvallet added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Jul 12, 2024

pgayvallet mentioned this issue Jul 12, 2024

[Task Manager] Kibana discovery service #187696

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Kibana discovery service #188177

[Core] Kibana discovery service #188177

pgayvallet commented Jul 12, 2024

elasticmachine commented Jul 12, 2024

pgayvallet commented Jul 12, 2024

rudolf commented Jul 12, 2024

[Core] Kibana discovery service #188177

[Core] Kibana discovery service #188177

Comments

pgayvallet commented Jul 12, 2024

elasticmachine commented Jul 12, 2024

pgayvallet commented Jul 12, 2024

rudolf commented Jul 12, 2024