Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support a verbose/debugging param in search pipelines #14745

Open
ohltyler opened this issue Jul 12, 2024 · 4 comments · May be fixed by #16843
Open

[Feature Request] Support a verbose/debugging param in search pipelines #14745

ohltyler opened this issue Jul 12, 2024 · 4 comments · May be fixed by #16843
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Other Priority-High Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0

Comments

@ohltyler
Copy link
Member

Is your feature request related to a problem? Please describe

As the library of search pipeline processors continues to grow and become more complex, it can become increasingly difficult to know how data is passed around and transformed through the processors. An example is the introduction of ML inference processors which can have logic to transform arbitrary and complex model inputs/outputs.

Describe the solution you'd like

Having a way to debug and view the state of search processors (on request side and response side) would be helpful in discovering issues related to data transformation or any other intermediate failure. It could also be generally useful in being able to view the end-to-end pipeline execution and easily see how the request is transformed, executed, and how any response data is transformed. Additionally, this could be consumed and viewed on the flow framework frontend plugin, which is initially focused on the configuration and testing of ingest and search pipelines as users build out their complex use cases.

A few different implementation ideas:

  1. Add a verbose parameter to a search request that contains a search pipeline, or standalone API, for returning the end-to-end breakdown of each processor's output - note this is already done today in ingest pipelines (see verbose param here)
  2. Add some return_request parameter to a search request that contains a search pipeline, and return the finalized/transformed search request that was used to execute against an index. While this wouldn't be processor-level granularity, it could be a simple way to get some intermediate information.

Intuitively, I think Option 1 provides the most flexibility and simplest/straightforward design. It is also consistent with how ingest pipelines supports this idea.

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

@ohltyler ohltyler added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 12, 2024
@github-actions github-actions bot added the Other label Jul 12, 2024
@ohltyler ohltyler changed the title [Feature Request] Support debugging / pipeline simulation in search pipelines [Feature Request] Support a verbose/debugging param in search pipelines Jul 12, 2024
@ohltyler
Copy link
Member Author

ohltyler commented Oct 30, 2024

We can further scope down the desired data output from this change. Listing them out below in order of priority:

  1. Transformed search requests - given a pipeline with search request processors, expose the transformed search request as it passes through the processors (the individual search request processor output)
  2. Interim outputs of each search request and search response processor - given a pipeline with search request and/or search response processors, output each interim transformation of the search request and search response (all processor outputs).
  3. Success/failure of each processor
  4. Completion timestamps of each processor, and/or some way to determine the time spent in a given processor

Item 1 : Will allow us to enable "Preview" on the UI when chaining search request processors.
Item 2: Currently, on the UI, in order to view the "interim" outputs given some search response processor, we build out a temporary pipeline up to & including the selected processor, execute it, and display the result. Item 2 would greatly simplify this, as we could remove all of that custom logic, and instead execute the entire pipeline, and just parse out the selected processor's output. It will also lower the load to easily add to the UI to let users click on any processor and view its output, and/or view all interim outputs at once.
Items 3/4: Will open up further UX opportunities to provide fine-grained details and debugging outputs for users building complex pipelines. (e.g., debugging which processor is causing issues or taking a long time to complete - maybe some lagging LLM response? etc.,...,)

@reta
Copy link
Collaborator

reta commented Oct 31, 2024

For search requests, we do support profiling (profile=true), I think including the data for each search processor (new profiler sections) would be a natural way to expose additional stats?

@dbwiddis
Copy link
Member

dbwiddis commented Nov 1, 2024

Interim outputs of each search request and search response processor

This could include a very large number of hits, often paginated or limited to "top". If we're using this for debugging purposes we probably don't need all the hits; a representative sample should do, right?

@owaiskazi19 owaiskazi19 added the Search Search query, autocomplete ...etc label Nov 13, 2024
@owaiskazi19
Copy link
Member

This could include a very large number of hits, often paginated or limited to "top". If we're using this for debugging purposes we probably don't need all the hits; a representative sample should do, right?

I agree with @dbwiddis on this point. Since our primary goal is to understand how each processor transforms the data, we can limit the size of the search response to just one or two hits. This approach is similar to the ingest pipeline's _simulate API, which demonstrates how data will be transformed upon ingestion.

We don't have a comparable API for search pipelines because we can't simulate the response; we must make an actual search request to get even a sample search response. Given that the requirement for this issue is the verbose flag, it's prudent to limit the response size to avoid dealing with a large number of hits.

The key reasons for this approach are:
It provides sufficient information to demonstrate the transformation process.
It reduces the computational load and response size.
It aligns with the primary goal of illustrating the pipeline's effect on the data.

@joshpalis joshpalis added the v2.19.0 Issues and PRs related to version 2.19.0 label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Other Priority-High Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

7 participants