Skip to content

Commit 76c3ad7

Browse files
Zhangxunmtkolchfa-awsnatebower
authored
add the documentations for the new ml_inference processor (#10652)
* add the documentations for the new ml_inference processor Signed-off-by: Xun Zhang <xunzh@amazon.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: Xun Zhang <xunzh@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent 7626642 commit 76c3ad7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+526
-517
lines changed

_data-prepper/pipelines/configuration/processors/add-entries.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: add_entries
3+
title: Add entries
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 40
6+
nav_order: 10
77
---
88

9-
# add_entries
9+
# Add entries processor
1010

1111
The `add_entries` processor adds entries to an event.
1212

_data-prepper/pipelines/configuration/processors/aggregate.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: aggregate
3+
title: Aggregate
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 41
6+
nav_order: 20
77
---
88

9-
# aggregate
9+
# Aggregate processor
1010

1111
The `aggregate` processor groups events based on the values of `identification_keys`. Then, the processor performs an action on each group, helping reduce unnecessary log volume and creating aggregated logs over time. You can use existing actions or create your own custom aggregations using Java code.
1212

_data-prepper/pipelines/configuration/processors/anomaly-detector.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
---
22
layout: default
3-
title: anomaly_detector
3+
title: Anomaly detector
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 45
6+
nav_order: 30
77
---
88

9-
# anomaly_detector
9+
# Anomaly detector processor
1010

11-
The anomaly detector processor takes structured data and runs anomaly detection algorithms on fields that you can configure in that data. The data must be either an integer or a real number for the anomaly detection algorithm to detect anomalies. Deploying the aggregate processor in a pipeline before the anomaly detector processor can help you achieve the best results, as the aggregate processor automatically aggregates events by key and keeps them on the same host. For example, if you are searching for an anomaly in latencies from a specific IP address and if all the events go to the same host, then the host has more data for these events. This additional data results in better training of the machine learning (ML) algorithm, which results in better anomaly detection.
11+
The `anomaly_detector` processor takes structured data and runs anomaly detection algorithms on fields that you can configure in that data. The data must be either an integer or a real number for the anomaly detection algorithm to detect anomalies. Deploying the aggregate processor in a pipeline before the `anomaly_detector` processor can help you achieve the best results, as the aggregate processor automatically aggregates events by key and keeps them on the same host. For example, if you are searching for an anomaly in latencies from a specific IP address and if all the events go to the same host, then the host has more data for these events. This additional data results in better training of the machine learning (ML) algorithm, which results in better anomaly detection.
1212

1313
## Configuration
1414

15-
You can configure the anomaly detector processor by specifying a key and the options for the selected mode. You can use the following options to configure the anomaly detector processor.
15+
You can configure the `anomaly_detector` processor by specifying a key and the options for the selected mode. You can use the following options to configure the `anomaly_detector` processor.
1616

1717
| Name | Required | Description |
1818
| :--- | :--- | :--- |
@@ -25,11 +25,11 @@ You can configure the anomaly detector processor by specifying a key and the opt
2525

2626
### Keys
2727

28-
Keys that are used in the anomaly detector processor are present in the input event. For example, if the input event is `{"key1":value1, "key2":value2, "key3":value3}`, then any of the keys (such as `key1`, `key2`, `key3`) in that input event can be used as anomaly detector keys as long as their value (such as `value1`, `value2`, `value3`) is an integer or real number.
28+
Keys that are used in the `anomaly_detector` processor are present in the input event. For example, if the input event is `{"key1":value1, "key2":value2, "key3":value3}`, then any of the keys (such as `key1`, `key2`, `key3`) in that input event can be used as anomaly detector keys as long as their value (such as `value1`, `value2`, `value3`) is an integer or real number.
2929

3030
### random_cut_forest mode
3131

32-
The random cut forest (RCF) ML algorithm is an unsupervised algorithm for detecting anomalous data points within a dataset. To detect anomalies, the anomaly detector processor uses the `random_cut_forest` mode.
32+
The random cut forest (RCF) ML algorithm is an unsupervised algorithm for detecting anomalous data points within a dataset. To detect anomalies, the `anomaly_detector` processor uses the `random_cut_forest` mode.
3333

3434
| Name | Description |
3535
| :--- | :--- |
@@ -38,8 +38,8 @@ The random cut forest (RCF) ML algorithm is an unsupervised algorithm for detect
3838
RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. OpenSearch Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated:
3939

4040

41-
```json
42-
{ "latency": 11.5, "deviation_from_expected":[10.469302736820003],"grade":1.0}
41+
```json
42+
{ "latency": 11.5, "deviation_from_expected":[10.469302736820003],"grade":1.0}
4343
```
4444

4545
In this example, `deviation_from_expected` is a list of deviations for each of the keys from their corresponding expected values, and `grade` is the anomaly grade that indicates the anomaly severity.
@@ -72,6 +72,6 @@ ad-pipeline:
7272
random_cut_forest:
7373
```
7474
75-
When you run the anomaly detector processor, the processor extracts the value for the `latency` key, and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector.
75+
When you run the `anomaly_detector` processor, the processor extracts the value for the `latency` key and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector.
7676

7777
`{"ip":"1.2.3.4", "bytes":234234, "latency":0.2}`

_data-prepper/pipelines/configuration/processors/aws-lambda.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
layout: default
3-
title: aws_lambda
3+
title: AWS Lambda
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 10
6+
nav_order: 40
77
---
88

9-
# aws_lambda integration for OpenSearch Data Prepper
9+
# AWS Lambda processor
1010

11-
The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their OpenSearch Data Prepper pipelines for flexible event processing and data routing.
11+
The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows you to use serverless computing capabilities within your OpenSearch Data Prepper pipelines for flexible event processing and data routing.
1212

1313
## AWS Lambda processor configuration
1414

@@ -64,7 +64,7 @@ processors:
6464
lambda_when: "event['status'] == 'process'"
6565
6666
```
67-
{% include copy-curl.html %}
67+
{% include copy.html %}
6868

6969
## Usage
7070

@@ -101,5 +101,4 @@ Integration tests for this plugin are executed separately from the main Data Pre
101101
```
102102
./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.processor.lambda.region="us-east-1" -Dtests.processor.lambda.functionName="lambda_test_function" -Dtests.processor.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role
103103
```
104-
105-
{% include copy-curl.html %}
104+
{% include copy.html %}

_data-prepper/pipelines/configuration/processors/convert-entry-type.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: convert_entry_type
3+
title: Convert entry type
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 47
6+
nav_order: 50
77
---
88

9-
# convert_entry_type
9+
# Convert entry type processor
1010

1111
The `convert_entry_type` processor converts a value type associated with the specified key in a event to the specified type. It is a casting processor that changes the types of some fields in events. Some data must be converted to a different type, such as an integer to a double, or a string to an integer, so that it will pass the events through condition-based processors or perform conditional routing.
1212

_data-prepper/pipelines/configuration/processors/copy-values.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: copy_values
3+
title: Copy values
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 48
6+
nav_order: 60
77
---
88

9-
# copy_values
9+
# Copy values processor
1010

1111
The `copy_values` processor copies values within an event and is a [mutate event]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/mutate-event/) processor.
1212

_data-prepper/pipelines/configuration/processors/csv.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: csv
3+
title: CSV
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 49
6+
nav_order: 70
77
---
88

9-
# csv
9+
# CSV processor
1010

1111
The `csv` processor parses comma-separated values (CSVs) from the event into columns.
1212

_data-prepper/pipelines/configuration/processors/date.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: date
3+
title: Date
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 50
6+
nav_order: 80
77
---
88

9-
# date
9+
# Date processor
1010

1111

1212
The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp.

_data-prepper/pipelines/configuration/processors/decompress.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: decompress
3+
title: Decompress
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 40
6+
nav_order: 90
77
---
88

9-
# decompress
9+
# Decompress processor
1010

1111
The `decompress` processor decompresses any Base64-encoded compressed fields inside of an event.
1212

_data-prepper/pipelines/configuration/processors/delay.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
layout: default
3-
title: delay
3+
title: Delay
44
parent: Processors
55
grand_parent: Pipelines
6-
nav_order: 41
6+
nav_order: 100
77
---
88

9-
# delay
9+
# Delay processor
1010

1111
This processor will add a delay into the processor chain. Typically, you should use this only for testing, experimenting, and debugging.
1212

0 commit comments

Comments
 (0)