Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deriving metrics from logs use case to Data Prepper #6248

Merged
merged 29 commits into from
Jul 3, 2024
Merged
Changes from 18 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
39d2161
Add use case to Data Prepper
vagimeli Jan 23, 2024
06fb2a3
Add content
vagimeli Jan 23, 2024
a0fe1f0
Copy edits
vagimeli Jan 23, 2024
a5bcd1b
Merge branch 'main' into metrics-logs
vagimeli Jan 31, 2024
a47d898
Merge branch 'main' into metrics-logs
vagimeli Feb 5, 2024
217cee3
Merge branch 'main' into metrics-logs
vagimeli Feb 22, 2024
803b748
Merge branch 'main' into metrics-logs
vagimeli Feb 26, 2024
364619c
Update metrics-logs.md
vagimeli Mar 6, 2024
6ecd3db
Merge branch 'main' into metrics-logs
vagimeli Apr 3, 2024
e60fdb9
Merge branch 'main' into metrics-logs
vagimeli Apr 4, 2024
223b20c
Merge branch 'main' into metrics-logs
vagimeli Apr 4, 2024
a6b0a6a
Merge branch 'main' into metrics-logs
vagimeli Apr 9, 2024
39a2c4a
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Apr 25, 2024
111b669
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Apr 25, 2024
889cada
Merge branch 'main' into metrics-logs
vagimeli Apr 25, 2024
8c54196
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
e453c50
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
d38691d
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
c02adb0
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
44500d8
Merge branch 'main' into metrics-logs
vagimeli May 8, 2024
c761ef5
Merge branch 'main' into metrics-logs
vagimeli May 13, 2024
db890e3
Merge branch 'main' into metrics-logs
vagimeli Jun 5, 2024
4b0d81b
Merge branch 'main' into metrics-logs
vagimeli Jun 26, 2024
3afc5da
Update metrics-logs.md
vagimeli Jun 26, 2024
9cb87a5
Merge branch 'main' into metrics-logs
vagimeli Jun 28, 2024
540b97c
Update metrics-logs.md
vagimeli Jun 28, 2024
871bf11
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Jun 28, 2024
9b0b40a
Merge branch 'main' into metrics-logs
vagimeli Jul 2, 2024
20a6d8f
Merge branch 'main' into metrics-logs
vagimeli Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions _data-prepper/common-use-cases/metrics-logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: default
title: Deriving metrics from logs
parent: Common use cases
nav_order: 15
---

# Deriving metrics from logs

You can use Data Prepper to derive metrics from logs. The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writes data to two indexes - one with un-aggregated events and the other derived metrics.

Can we clarify somewhere? You mention below how we use two pipelines. But, maybe we can make this end result more explicit.


The overall pipeline contains two pipelines:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"main" or "primary" instead of "overall"?

vagimeli marked this conversation as resolved.
Show resolved Hide resolved

- `apache-log-pipeline-with-metrics` -- Receives logs through an HTTP client like FluentBit, uses `grok` to extract important values from the logs by matching the value in the `log` key against the [Apache Common Log Format](https://httpd.apache.org/docs/2.4/logs.html#accesslog), and then forwards the grokked logs to both the `log-to-metrics-pipeline` pipeline and to an OpenSearch index named `logs`.

- `log-to-metrics-pipeline` -- Receives the grokked logs from the `apache-log-pipeline-with-metrics` pipeline, aggregates the logs, and derives histogram metrics of `bytes` based on the values in the `clientip` and `request` keys. Finally, it sends the histogram metrics to an OpenSearch index named `histogram_metrics`.

```json
apache-log-pipeline-with-metrics:
source:
http:
# Provide the path for ingestion. ${pipelineName} will be replaced with pipeline name configured for this pipeline.
# In this case it would be "/apache-log-pipeline-with-metrics/logs". This will be the FluentBit output URI value.
path: "/${pipelineName}/logs"
processor:
- grok:
match:
log: [ "%{COMMONAPACHELOG_DATATYPED}" ]
sink:
- opensearch:
...
index: "logs"
- pipeline:
name: "log-to-metrics-pipeline"

log-to-metrics-pipeline:
source:
pipeline:
name: "apache-log-pipeline-with-metrics"
processor:
- aggregate:
# Specify the required identification keys
identification_keys: ["clientip", "request"]
action:
histogram:
# Specify the appropriate values for each of the following fields
key: "bytes"
record_minmax: true
units: "bytes"
buckets: [0, 25000000, 50000000, 75000000, 100000000]
# Pick the required aggregation period
group_duration: "30s"
sink:
- opensearch:
...
index: "histogram_metrics"
```
{% include copy-curl.html %}
Loading