Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MD] Onboard batch concurrent search with multiple data source #2174

Open
2 tasks
zhongnansu opened this issue Aug 19, 2022 · 4 comments
Open
2 tasks

[MD] Onboard batch concurrent search with multiple data source #2174

zhongnansu opened this issue Aug 19, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request multiple datasource multiple datasource project

Comments

@zhongnansu
Copy link
Member

zhongnansu commented Aug 19, 2022

image

1. Background:

- batch concurrent search is labeled as deprecated in current code base Label is removed here

  • Once the settings is turned on. Any searches through High level data plugin search API, will issue _msearch requests to OpenSearch. A _msearch can execute multiple search requests together, instead of one-by-one through multiple _searchAPI call. Especially used for dashboards use case, where it renders multiple visualization, with multiple search requests are issued at same time.

The individual requests are queueing up here.

/**
* This function introduces a slight delay in the request process to allow multiple requests to queue
* up (e.g. when a dashboard is loading).
*/
export async function fetchSoon(

2. Why batch concurrent search doesn't work with multiple data source feature?

  • MD feature enables OSD user to search against any other OpenSearch clusters. MD identifier is passed along with search request, so the server side can decide which data source client to use, to query data.
  • One _msearch request can only be sent from One client: Batch concurrent search queues up search requests, and issue one _msearch, from one client. But when MD feature is enabled, these search requests may contain info from many data sources. So they can't be queued up. (E.g. created a dashboard, that has visualization-1 (from data source 1), visualization-2 (from data source 2), etc.

3. Solution

To fully support batch concurrent search with MD, it requires a refactor of everywhere consuming msearch in OSD, a refactor of the queueing mechanism. Additional mechanism to store and pass multiple data sources identifiers. Given the effort, I propose the following.

  • Phase 1: Disable batch concurrent search from Advanced OSD setting, when MD is enabled
  • Phase 2: Depending on community feedback, then decide the priority of this integration.

Notes:

Code changes to support msearch with MD needs to be made from here at least. Pass in data source client as opensearchClient, based on the if the request body/param has dataSourceId. To get data source client, we use the standard api context.data_source.opensearch.getClient(<dataSourceId>)

opensearchClient: context.core.opensearch.client,

@zhongnansu zhongnansu added multiple datasource multiple datasource project bug Something isn't working labels Aug 19, 2022
@kavilla
Copy link
Member

kavilla commented Sep 6, 2022

I wouldn't disable batch requests as it appears to be a highly used setting.

@zhongnansu zhongnansu self-assigned this Sep 6, 2022
@zhongnansu
Copy link
Member Author

I wouldn't disable batch requests as it appears to be a highly used setting.

Got it. I saw the commit that removes the deprecated label of batch search setting. #735

@zhongnansu zhongnansu changed the title [MD] _msearch not likely to work with multiple data source [MD] batch concurrent search is not likely to work with multiple data source Oct 3, 2022
@zhongnansu zhongnansu added enhancement New feature or request v2.5.0 'Issues and PRs related to version v2.5.0' and removed research bug Something isn't working labels Oct 3, 2022
@zhongnansu zhongnansu changed the title [MD] batch concurrent search is not likely to work with multiple data source [MD] Onboard batch concurrent search with multiple data source Nov 9, 2022
@joshuarrrr
Copy link
Member

@zhongnansu Is this still targeting 2.5.0? I don't see a linked PR.

@zhongnansu
Copy link
Member Author

@zhongnansu Is this still targeting 2.5.0? I don't see a linked PR.

This will be for later release. Removed 2.5 label

@zhongnansu zhongnansu removed the v2.5.0 'Issues and PRs related to version v2.5.0' label Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request multiple datasource multiple datasource project
Projects
None yet
Development

No branches or pull requests

3 participants