Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new use cases to ML Inference Search Response Processor #8639

Merged
96 changes: 47 additions & 49 deletions _search-plugins/search-pipelines/ml-inference-search-response.md
Original file line number Diff line number Diff line change
Expand Up @@ -390,12 +390,13 @@ The response confirms that the processor has generated text embeddings in the `p
}
```

### Example: GENAI use case
### Example: Externally hosted model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a Externally hosted model example at line 99 already. maybe we should come up with other title? @kolchfa-aws

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


This example demonstrates configuring an `ml_inference` search response processor to work with an externally hosted large language model (LLM) and map the model's response to the search extension object. Using the `ml_inference` processor, you can enable an LLM to summarize search results directly within the response. The summary is included in the `ext` field of the search response, providing seamless access to AI-generated insights alongside the original search results.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The following example shows you how to configure an `ml_inference` search response processor with a Generative Artificial Intelligence(GenAI) model and mapping the model response to the search extension.
**Prerequisite**

Step 0: Host a model
The pre-requisite is a registered GENAI model in OpenSearch. For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). Here is a sample predict using a registered model, which requires a prompt and a context field.
You must configure an externally hosted LLM for this use case. For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). Once you register the LLM, you can use the following request to test it. This request requires providing the `prompt` and `context` fields:

```json
POST /_plugins/_ml/models/KKne6JIBAs32TwoK-FFR/_predict
Expand All @@ -406,7 +407,10 @@ POST /_plugins/_ml/models/KKne6JIBAs32TwoK-FFR/_predict
}
}
```
Here is the sample response from model prediction.
{% include copy-curl.html %}

The response contains the model output in the `inference_results` field:

```json
{
"inference_results": [
Expand All @@ -430,12 +434,12 @@ Here is the sample response from model prediction.
]
}
```
Step 1: Create a pipeline

The following example shows you how to create a search pipeline for a generative AI model. The model requires a context field as input and generates a response. It summarizes the text in the review field and stores the summary in the ext.ml_inference.llm_response field of the search response.
**Step 1: Create a pipeline**

```json
Create a search pipeline for the registered model. The model requires a `context` field as input. The model response summarizes the text in the `review` field and stores the summary in the `ext.ml_inference.llm_response` field of the search response:

```json
PUT /_search/pipeline/my_pipeline_request_review_llm
{
"response_processors": [
Expand All @@ -456,29 +460,29 @@ PUT /_search/pipeline/my_pipeline_request_review_llm
}
],
"model_config": {
"prompt": "\n\nHuman: You are a professional data analysist. You will always answer question: Which month had the lowest customer acquisition cost per new customer? based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Assistant:""
"prompt": "\n\nHuman: You are a professional data analysist. You will always answer question: Which month had the lowest customer acquisition cost per new customer? based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Assistant:"
},
"ignore_missing": false,
"ignore_failure": false
}
}
]
}

```
{% include copy-curl.html %}

In this configuration:
In this configuration, you're providing the following parameters:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The `model_id` specifies the ID of the generative AI model.
The `function_name` is set to "REMOTE", indicating an externally hosted model.
The `input_map` maps the review field from the document to the context field expected by the model.
The `output_map` specifies that the model's response should be stored in ext.ml_inference.llm_response in the search response.
The `model_config` includes a prompt that instructs the model on how to process the input and generate a summary.
- The `model_id` specifies the ID of the generative AI model.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `function_name` is set to `REMOTE`, indicating an externally hosted model.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `input_map` maps the review field from the document to the context field expected by the model.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `output_map` specifies that the model's response should be stored in `ext.ml_inference.llm_response` in the search response.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `model_config` includes a prompt that instructs the model how to process the input and generate a summary.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Step 2: Index sample documents
**Step 2: Index sample documents**

Index some sample documents to test the pipeline:
mingshl marked this conversation as resolved.
Show resolved Hide resolved

```json
POST /_bulk
{"index":{"_index":"review_string_index","_id":"1"}}
Expand All @@ -490,9 +494,10 @@ POST /_bulk
```
{% include copy-curl.html %}

Step 3: Run the pipeline
**Step 3: Run the pipeline**

Run a search query using the pipeline:

Execute a search query using the pipeline:
```json
GET /review_string_index/_search?search_pipeline=my_pipeline_request_review_llm
{
Expand All @@ -503,9 +508,7 @@ GET /review_string_index/_search?search_pipeline=my_pipeline_request_review_llm
```
{% include copy-curl.html %}

Step 4: Examine the response

The response will include the original documents and the generated summary in the ext.ml_inference.llm_response field:
The response includes the original documents and the generated summary in the `ext.ml_inference.llm_response` field:

```json
{
Expand Down Expand Up @@ -565,17 +568,14 @@ The response will include the original documents and the generated summary in th
}
}
```
{% include copy-curl.html %}
This example demonstrates how the ml_inference search response processor can be used with a generative AI model to provide summarization of search results. The summary is included in the ext field of the search response, allowing for easy access to the AI-generated insights alongside the original search results.


### Example: Rerank use case
### Example: Reranking search results using a text similarity model

The following example shows you how to configure an `ml_inference` search response processor with a text similarity model.

**Prerequisite**

Step 0: Host a model
The pre-requisite is a registered text similarity model in OpenSearch. For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). Here is a sample predict, which requires a text and a text_pair field within inputs field.
You must configure an externally hosted LLM for this use case. For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). Once you register the LLM, you can use the following request to test it. This request requires providing a `text` and `text_pair` fields within the `inputs` field:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws this rerank use case is using a registered text similarity model not llm model


```json
POST /_plugins/_ml/models/Ialx65IBAs32TwoK1lXf/_predict
Expand All @@ -589,6 +589,7 @@ POST /_plugins/_ml/models/Ialx65IBAs32TwoK1lXf/_predict
}
}
```
{% include copy-curl.html %}

The model returns similarity scores for each input document:

Expand All @@ -612,9 +613,9 @@ The model returns similarity scores for each input document:
```
{% include copy-curl.html %}

Step 1: Index sample documents
**Step 1: Index sample documents**

Create an index and add some sample documents:
Create an index and add some sample documents to it:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
POST _bulk
Expand All @@ -627,10 +628,9 @@ POST _bulk
```
{% include copy-curl.html %}

Step 2: Create a search pipeline
**Step 2: Create a search pipeline**

Create a search pipeline that leverages a text similarity model using 'one-to-one' inference. The pipeline processes each document in the search hits individually, sending one model prediction request per document.
When mapping query text from the search request in `input_map`, the json path needs to starts with `$._request` or `_request`.
For this example, you'll create a search pipeline that uses a text similarity model in a `one-to-one` inference mode, processing each document in the search results individually. This setup allows the model to make one prediction request per document, providing specific relevance insights for each search hit. When using `input_map` to map the search request to query text, the JSON path must start with `$._request` or `_request`:

```json
PUT /_search/pipeline/my_rerank_pipeline
Expand Down Expand Up @@ -663,30 +663,29 @@ PUT /_search/pipeline/my_rerank_pipeline
"rerank": {
"by_field": {
"target_field": "rank_score",
"remove_target_field":true
"remove_target_field": true
}
}

}
]
}
```
{% include copy-curl.html %}

In this configuration:
In this configuration, you're providing the following parameters:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The `model_id` specifies the unique identifier of the text similarity model.
The `function_name` is set to "REMOTE", indicating that the model is hosted externally.
The `input_map` maps the "diary" field from each document to the "text" input of the model, and the search query term to the "text_pair" input.
The `output_map` maps the model's score to a field named "rank_score" in each document.
The `model_input` formats the input for the model, ensuring it matches the structure expected by the predict API.
The `one_to_one` parameter is set to true, ensuring that the model processes each document individually, rather than batching multiple documents together.
The `ignore_missing` parameter is set to false, causing the processor to fail if the mapped fields are missing from a document.
The `ignore_failure` parameter is set to false, causing the entire pipeline to fail if the ML inference processor encounters an error.
- The `model_id` specifies the unique identifier of the text similarity model.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `function_name` is set to `REMOTE`, indicating that the model is hosted externally.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `input_map` maps the `diary` field from each document to the `text` input of the model, and the search query term to the `text_pair` input.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `output_map` maps the model's score to a field named `rank_score` in each document.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `model_input` formats the input for the model, ensuring it matches the structure expected by the Predict API.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `one_to_one` parameter is set to `true`, ensuring that the model processes each document individually, rather than batching multiple documents together.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- The `ignore_missing` parameter is set to `false`, causing the processor to fail if the mapped fields are missing from a document.
- The `ignore_failure` parameter is set to `false`, causing the entire pipeline to fail if the ML inference processor encounters an error.

The rerank processor is applied after the ML inference. It reorders the documents based on the "rank_score" field generated by the ML model and then removes this field from the final results.
The `rerank` processor is applied after the ML inference. It reorders the documents based on the `rank_score` field generated by the ML model and then removes this field from the final results.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Step 3: Run the pipeline
**Step 3: Run the pipeline**

Now, perform a search using the created pipeline:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -704,7 +703,7 @@ GET /demo-index-0/_search?search_pipeline=my_rerank_pipeline
```
{% include copy-curl.html %}

The response includes the original documents and rerank with their calculated rank scores:
The response includes the original documents and their reranked scores:

```json
{
Expand Down Expand Up @@ -753,5 +752,4 @@ The response includes the original documents and rerank with their calculated ra
"shards": []
}
}
```
{% include copy-curl.html %}
```
Loading