-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Tracking Search Pipeline Execution #16705
Comments
Hi, Thanks for the detailed information! I'm pretty new to the OpenSearch project. I'm trying to understand when the |
Thanks for your question! When the verbose_pipeline parameter is passed, the information is not persisted anywhere—it’s generated dynamically during the request execution. The source of the information is the actual processing flow of the search pipeline in memory. Each processor in the pipeline logs its input, output, status, and execution time as the request flows through. This information is collected directly from the execution of the processors and returned as part of the response. It is not stored in a system index, map, or logs, which helps keep the feature lightweight and avoids adding unnecessary overhead to the system. Hope this clarifies! Let me know if you have any follow-up questions! |
Since this param is part of the search request, search backpressure would already be integrated. |
I think it's a good idea. As search pipelines get more complicated, getting step-by-step logging from each processor will be useful (though the response can get quite large -- kind of like profiler output). I discussed doing something like this with @mingshl. As a minor correction to the section Search Phase Result Processor Fields, the phase results are just doc IDs and scores (if I recall correctly). Still, being able to see how scores were processed, e.g. by the hybrid query score normalizer, would be pretty nice. I was discussing ways of getting "explain" output from the normalizer with @martin-gaievski, and I think this could solve that problem. If you need somewhere to store the verbose output, the |
To handle this do you think we can add a size parameter or limit the response by a small number let's say 5 because we just have to see how the processor does processing. We probably don't need an entire SearchResponse?
I don't think so we have to store it. We could just directly read from the ProcessorResultMap and return it in the SearchResponse. |
Glad to see this RFC is cut! It would be helpful for debugging and tracking search processors. I wish this will also be added to ingest pipelines too. Can ingest pipeline take similar design? Just so, if we decide to do the same on ingest pipeline, we can make it consistent. Wondering if we can reuse |
We already have verbose for Ingest Pipelines https://opensearch.org/docs/latest/ingest-pipelines/simulate-ingest/#query-parameters |
Thanks for the suggestion! I just checked the code base, and |
Thanks for the feedback! I’ll update the section on Search Phase Result Processor Fields to reflect that the phase results are just doc IDs and scores—thanks for pointing that out.
I’ll also take a closer look at using the PipelinedRequest object for storing verbose output. |
Thanks @junweid62 , certainly +1 to the feature. The only concern I have is that we are probably introducing too many knobs on the search side:
From my perspective, |
Thanks for the feedback, profile is fundamentally designed to provide timing-related insights, as it focuses on performance debugging. However, verbose_pipeline serves a different purpose that complements profile rather than overlapping with it: The profile API focuses on timing information and payload metrics (e.g., size/quantity), which are excellent for debugging performance bottlenecks. As highlighted in the OpenSearch documentation: "The Profile API provides timing information about the execution of individual components of a search request. Using the Profile API, you can debug slow requests and understand how to improve their performance." It doesn't track logical transformations or interim values between processors. This makes verbose_pipeline the right tool for cases where users need to understand how data evolves across the pipeline, especially with increasingly complex processors like those involving ML inference. |
Thank @junweid62 , got it, I think I misunderstood a bit the scope of it (and from the comments above, there seems to be a mix of the intermediate data and time-related insights like
What do you think? Thanks! |
Thanks for the proposal! I see where you're coming from, but I feel like keeping everything together in verbose_pipeline might be a better approach. Here's why:
I think putting it all under verbose_pipeline would give users a full picture of each processor—what it’s doing and how long it’s taking—all in one spot. What do you think? Happy to chat more if needed! |
See your point, I think |
Is your feature request related to a problem? Please describe
With the expansion of search pipeline processors, tracking data transformations and understanding data flow through complex processors is becoming challenging. The introduction of ML inference processors, which can manipulate model inputs and outputs, increases the need for a tool to visualize and debug the flow of data across these processors. Such functionality would aid in troubleshooting, optimizing pipeline configurations, and provide transparency for end-to-end transformations of search requests and responses.
As search pipeline processors grow in complexity, there is an increasing need to: Related Issue
This capability would also be valuable for frontend plugins like the Flow Framework, helping users configure and test complex ingest and search pipelines.
Describe the solution you'd like
Adding
verbose
Parameter to Search Request [Preferred]Overview
In this approach, the
verbose_pipeline
parameter is introduced as a query parameter in the search request URL. When used in conjunction with thesearch_pipeline
parameter, it activates a debugging mode, allowing detailed tracking of search pipeline processor execution without requiring a new API or changes to theExplain
API.Pros
Minimal Changes to Existing Workflow:
Backward Compatibility:
verbose
parameter is optional and defaults tofalse
. Existing search requests remain unaffected unless explicitly updated to includeverbose=true
.Alignment with OpenSearch Design:
profile
query parameter.Cons
Performance Impact:
Example Request
GET /my_index/_search?search_pipeline=my_debug_pipeline&verbose_pipeline=true
Example Response
Common Fields for All Processors
Each processor, regardless of type, will include the following common fields:
filter_query
,collapse
).success
) or encountered an error (failure
).Request Processor Fields
For processors that handle the incoming search request:
Example:
Search Phase Result Processor Fields
For processors that handle intermediate results during the search phase:
Response Processor Fields
For processors that handle the final search response:
Verbose Mode Support Across Search Pipeline Configurations
The verbose mode is designed to seamlessly integrate with all ways of using a search pipeline, ensuring consistent debugging capabilities regardless of the method chosen. Below is an overview of how verbose mode supports different search pipeline configurations:
Related component
Search
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: