[Feature Request] Support a verbose/debugging param in search pipelines #14745

ohltyler · 2024-07-12T20:15:29Z

Is your feature request related to a problem? Please describe

As the library of search pipeline processors continues to grow and become more complex, it can become increasingly difficult to know how data is passed around and transformed through the processors. An example is the introduction of ML inference processors which can have logic to transform arbitrary and complex model inputs/outputs.

Describe the solution you'd like

Having a way to debug and view the state of search processors (on request side and response side) would be helpful in discovering issues related to data transformation or any other intermediate failure. It could also be generally useful in being able to view the end-to-end pipeline execution and easily see how the request is transformed, executed, and how any response data is transformed. Additionally, this could be consumed and viewed on the flow framework frontend plugin, which is initially focused on the configuration and testing of ingest and search pipelines as users build out their complex use cases.

A few different implementation ideas:

Add a verbose parameter to a search request that contains a search pipeline, or standalone API, for returning the end-to-end breakdown of each processor's output - note this is already done today in ingest pipelines (see verbose param here)
Add some return_request parameter to a search request that contains a search pipeline, and return the finalized/transformed search request that was used to execute against an index. While this wouldn't be processor-level granularity, it could be a simple way to get some intermediate information.

Intuitively, I think Option 1 provides the most flexibility and simplest/straightforward design. It is also consistent with how ingest pipelines supports this idea.

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

ohltyler · 2024-10-30T22:20:37Z

We can further scope down the desired data output from this change. Listing them out below in order of priority:

Transformed search requests - given a pipeline with search request processors, expose the transformed search request as it passes through the processors (the individual search request processor output)
Interim outputs of each search request and search response processor - given a pipeline with search request and/or search response processors, output each interim transformation of the search request and search response (all processor outputs).
Success/failure of each processor
Completion timestamps of each processor, and/or some way to determine the time spent in a given processor

Item 1 : Will allow us to enable "Preview" on the UI when chaining search request processors.
Item 2: Currently, on the UI, in order to view the "interim" outputs given some search response processor, we build out a temporary pipeline up to & including the selected processor, execute it, and display the result. Item 2 would greatly simplify this, as we could remove all of that custom logic, and instead execute the entire pipeline, and just parse out the selected processor's output. It will also lower the load to easily add to the UI to let users click on any processor and view its output, and/or view all interim outputs at once.
Items 3/4: Will open up further UX opportunities to provide fine-grained details and debugging outputs for users building complex pipelines. (e.g., debugging which processor is causing issues or taking a long time to complete - maybe some lagging LLM response? etc.,...,)

reta · 2024-10-31T15:22:07Z

For search requests, we do support profiling (profile=true), I think including the data for each search processor (new profiler sections) would be a natural way to expose additional stats?

dbwiddis · 2024-11-01T05:36:45Z

Interim outputs of each search request and search response processor

This could include a very large number of hits, often paginated or limited to "top". If we're using this for debugging purposes we probably don't need all the hits; a representative sample should do, right?

owaiskazi19 · 2024-11-13T22:42:28Z

This could include a very large number of hits, often paginated or limited to "top". If we're using this for debugging purposes we probably don't need all the hits; a representative sample should do, right?

I agree with @dbwiddis on this point. Since our primary goal is to understand how each processor transforms the data, we can limit the size of the search response to just one or two hits. This approach is similar to the ingest pipeline's _simulate API, which demonstrates how data will be transformed upon ingestion.

We don't have a comparable API for search pipelines because we can't simulate the response; we must make an actual search request to get even a sample search response. Given that the requirement for this issue is the verbose flag, it's prudent to limit the response size to avoid dealing with a large number of hits.

The key reasons for this approach are:
It provides sufficient information to demonstrate the transformation process.
It reduces the computational load and response size.
It aligns with the primary goal of illustrating the pipeline's effect on the data.

ohltyler added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 12, 2024

github-actions bot added the Other label Jul 12, 2024

ohltyler changed the title ~~[Feature Request] Support debugging / pipeline simulation in search pipelines~~ [Feature Request] Support a verbose/debugging param in search pipelines Jul 12, 2024

minalsha added Priority-High and removed untriaged labels Jul 15, 2024

ohltyler mentioned this issue Aug 2, 2024

Onboard search req / search resp ML processors opensearch-project/dashboards-flow-framework#256

Merged

1 task

minalsha assigned junweid62 Oct 28, 2024

ohltyler mentioned this issue Oct 31, 2024

[Enhancement] Display cluster-level information in the ReactFlow workspace opensearch-project/dashboards-flow-framework#377

Open

owaiskazi19 added the Search Search query, autocomplete ...etc label Nov 13, 2024

github-project-automation bot added this to Search Project Board Nov 13, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Nov 13, 2024

junweid62 mentioned this issue Nov 22, 2024

[RFC] Tracking Search Pipeline Execution #16705

Open

joshpalis added the v2.19.0 Issues and PRs related to version 2.19.0 label Nov 26, 2024

junweid62 linked a pull request Dec 13, 2024 that will close this issue

Add verbose pipeline parameter to output each processor's execution details #16843

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support a verbose/debugging param in search pipelines #14745

[Feature Request] Support a verbose/debugging param in search pipelines #14745

ohltyler commented Jul 12, 2024

ohltyler commented Oct 30, 2024 •

edited

Loading

reta commented Oct 31, 2024

dbwiddis commented Nov 1, 2024

owaiskazi19 commented Nov 13, 2024

[Feature Request] Support a verbose/debugging param in search pipelines #14745

[Feature Request] Support a verbose/debugging param in search pipelines #14745

Comments

ohltyler commented Jul 12, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

ohltyler commented Oct 30, 2024 • edited Loading

reta commented Oct 31, 2024

dbwiddis commented Nov 1, 2024

owaiskazi19 commented Nov 13, 2024

ohltyler commented Oct 30, 2024 •

edited

Loading