You define processors in the {beatname_lc}.yml file to filter and enhance the data before sending events to the configured output.
To define a processor, you specify the processor name, an optional condition, and a set of parameters:
processors:
- <processor_name>:
<parameters>
when:
<condition>
- <processor_name>:
<parameters>
when:
<condition>
...
Where:
-
<processor_name>
specifies a processor that performs some kind of action, such as selecting the fields that are exported or adding metadata to the event. -
<when: condition>
specifies an optional condition. If the condition is present, then the action is executed only if the condition is fulfilled. If no condition is passed, then the action is always executed. -
<parameters>
is the list of parameters to pass to the processor.
The supported processors are:
Each condition receives a field to compare. You can specify multiple fields
under the same condition by using AND
between the fields (for example,
field1 AND field2
).
For each field, you can specify a simple field name or a nested map, for
example dns.question.name
.
See [exported-fields] for a list of all the fields that are exported by {beatname_uc}.
The supported conditions are:
With the equals
condition, you can compare if a field has a certain value.
The condition accepts only an integer or a string value.
For example, the following condition checks if the response code of the HTTP transaction is 200:
equals:
http.response.code: 200
The contains
condition checks if a value is part of a field. The field can be
a string or an array of strings. The condition accepts only a string value.
For example, the following condition checks if an error is part of the transaction status:
contains:
status: "Specific error"
The regexp
condition checks the field against a regular expression. The
condition accepts only strings.
For example, the following condition checks if the process name starts with
foo
:
regexp:
system.process.name: "foo.*"
The range
condition checks if the field is in a certain range of values. The
condition supports lt
, lte
, gt
and gte
. The condition accepts only
integer or float values.
For example, the following condition checks for failed HTTP transactions by
comparing the http.response.code
field with 400.
range:
http.response.code:
gte: 400
This can also be written as:
range:
http.response.code.gte: 400
The following condition checks if the CPU usage in percentage has a value between 0.5 and 0.8.
range:
system.cpu.user.pct.gte: 0.5
system.cpu.user.pct.lt: 0.8
The or
operator receives a list of conditions.
or:
- <condition1>
- <condition2>
- <condition3>
...
For example, to configure the condition
http.response.code = 304 OR http.response.code = 404
:
or:
- equals:
http.response.code: 304
- equals:
http.response.code: 404
The and
operator receives a list of conditions.
and:
- <condition1>
- <condition2>
- <condition3>
...
For example, to configure the condition
http.response.code = 200 AND status = OK
:
and:
- equals:
http.response.code: 200
- equals:
status: OK
To configure a condition like <condition1> OR <condition2> AND <condition3>
:
or:
- <condition1>
- and:
- <condition2>
- <condition3>
The not
operator receives the condition to negate.
not:
<condition>
For example, to configure the condition NOT status = OK
:
not:
equals:
status: OK
The add_cloud_metadata
processor enriches each event with instance metadata
from the machine’s hosting provider. At startup it will detect the hosting
provider and cache the instance metadata.
The following cloud providers are supported:
-
Amazon Elastic Compute Cloud (EC2)
-
Digital Ocean
-
Google Compute Engine (GCE)
-
Tencent Cloud (QCloud)
-
Alibaba Cloud (ECS)
The simple configuration below enables the processor.
processors:
- add_cloud_metadata: ~
The add_cloud_metadata
processor has one optional configuration setting named
timeout
that specifies the maximum amount of time to wait for a successful
response when detecting the hosting provider. The default timeout value is
3s
.
If a timeout occurs then no instance metadata will be added to the events. This makes it possible to enable this processor for all your deployments (in the cloud or on-premise).
The metadata that is added to events varies by hosting provider. Below are examples for each of the supported providers.
EC2
{
"meta": {
"cloud": {
"availability_zone": "us-east-1c",
"instance_id": "i-4e123456",
"machine_type": "t2.medium",
"provider": "ec2",
"region": "us-east-1"
}
}
}
Digital Ocean
{
"meta": {
"cloud": {
"instance_id": "1234567",
"provider": "digitalocean",
"region": "nyc2"
}
}
}
GCE
{
"meta": {
"cloud": {
"availability_zone": "projects/1234567890/zones/us-east1-b",
"instance_id": "1234556778987654321",
"machine_type": "projects/1234567890/machineTypes/f1-micro",
"project_id": "my-dev",
"provider": "gce"
}
}
}
Tencent Cloud
{
"meta": {
"cloud": {
"availability_zone": "gz-azone2",
"instance_id": "ins-qcloudv5",
"provider": "qcloud",
"region": "china-south-gz"
}
}
}
Alibaba Cloud
This metadata is only available when VPC is selected as the network type of the ECS instance.
{
"meta": {
"cloud": {
"availability_zone": "cn-shenzhen",
"instance_id": "i-wz9g2hqiikg0aliyun2b",
"provider": "ecs",
"region": "cn-shenzhen-a"
}
}
}
The add_locale
processor enriches each event with the machine’s time zone
offset from UTC or with the name of the time zone. It supports one configuration
option named format
that controls whether an offset or time zone abbreviation
is added to the event. The default format is offset
. The processor adds the
a beat.timezone
value to each event.
The configuration below enables the processor with the default settings.
processors:
- add_locale: ~
This configuration enables the processor and configures it to add the time zone abbreviation to events.
processors:
- add_locale:
format: abbreviation
Note
|
Please note that add_locale differentiates between daylight savings
time (DST) and regular time. For example CEST indicates DST and and CET is
regular time.
|
The decode_json_fields
processor decodes fields containing JSON strings and
replaces the strings with valid JSON objects.
processors:
- decode_json_fields:
fields: ["field1", "field2", ...]
process_array: false
max_depth: 1
target: ""
overwrite_keys: false
The decode_json_fields
processor has the following configuration settings:
fields
-
The fields containing JSON strings to decode.
process_array
-
(Optional) A boolean that specifies whether to process arrays. The default is false.
max_depth
-
(Optional) The maximum parsing depth. The default is 1.
target
-
(Optional) The field under which the decoded JSON will be written. By default the decoded JSON object replaces the string field from which it was read. To merge the decoded JSON fields into the root of the event, specify
target
with an empty string (target: ""
). Note that thenull
value (target:
) is treated as if the field was not set at all. overwrite_keys
-
(Optional) A boolean that specifies whether keys that already exist in the event are overwritten by keys from the decoded JSON object. The default value is false.
The drop_event
processor drops the entire event if the associated condition
is fulfilled. The condition is mandatory, because without one, all the events
are dropped.
processors:
- drop_event:
when:
condition
See Conditions for a list of supported conditions.
The drop_fields
processor specifies which fields to drop if a certain
condition is fulfilled. The condition is optional. If it’s missing, the
specified fields are always dropped. The @timestamp
and type
fields cannot
be dropped, even if they show up in the drop_fields
list.
processors:
- drop_fields:
when:
condition
fields: ["field1", "field2", ...]
See Conditions for a list of supported conditions.
Note
|
If you define an empty list of fields under drop_fields , then no fields
are dropped.
|
The include_fields
processor specifies which fields to export if a certain
condition is fulfilled. The condition is optional. If it’s missing, the
specified fields are always exported. The @timestamp
and type
fields are
always exported, even if they are not defined in the include_fields
list.
processors:
- include_fields:
when:
condition
fields: ["field1", "field2", ...]
See Conditions for a list of supported conditions.
You can specify multiple include_fields
processors under the processors
section.
Note
|
If you define an empty list of fields under include_fields , then only
the required fields, @timestamp and type , are exported.
|
The add_kubernetes_metadata
processor annotates each event with relevant
metadata based on which Kubernetes pod the event originated from. Each event is
annotated with:
-
Pod Name
-
Namespace
-
Labels
The add_kubernetes_metadata
processor has two basic building blocks which are:
-
Indexers
-
Matchers
Indexers take in a pod’s metadata and builds indices based on the pod metadata.
For example, the ip_port
indexer can take a Kubernetes pod and index the pod
metadata based on all pod_ip:container_port
combinations.
Matchers are used to contruct lookup keys for querying indices. For example,
when the fields
matcher takes ["metricset.host"]
as a lookup field, it would
construct a lookup key with the value of the field metricset.host
.
Each Beat can define its own default indexers and matchers which are enabled by
default. For example, FileBeat enables the container
indexer, which indexes
pod metadata based on all container IDs, and a logs_path
matcher, which takes
the source
field, extracts the container ID, and uses it to retrieve metadata.
The configuration below enables the processor when the Beat is run as a pod in Kubernetes.
processors:
- add_kubernetes_metadata:
in_cluster: true
The configuration below enables the processor on a Beat running as a process on the Kubernetes node.
processors:
- add_kubernetes_metadata:
in_cluster: false
host: <hostname>
kube_config: ${HOME}/.kube/config
The configuration below has the default indexers and matchers disabled and enables ones that the user is interested in.
processors:
- add_kubernetes_metadata:
in_cluster: false
host: <hostname>
kube_config: ~/.kube/config
default_indexers.enabled: false
default_matchers.enabled: false
indexers:
- ip_port:
matchers:
- fields:
lookup_fields: ["metricset.host"]
The add_kubernetes_metadata
processor has the following configuration settings:
in_cluster
-
(Optional) Use in cluster settings for Kubernetes client,
true
by default. host
-
(Optional) In case
in_cluster
is false, use this host to connect to Kubernetes API. kube_config
-
(Optional) Use given config file as configuration for Kubernetes client.
default_indexers.enabled
-
(Optional) Enable/Disable default pod indexers, in case you want to specify your own.
default_matchers.enabled
-
(Optional) Enable/Disable default pod matchers, in case you want to specify your own.
The add_docker_metadata
processor annotates each event with relevant metadata
from Docker containers:
-
Container ID
-
Name
-
Image
-
Labels
processors:
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
#match_fields: ["system.process.cgroup.id"]
#match_source: true
#match_source_index: 4
#cleanup_timeout: 60
# To connect to Docker over TLS you must specify a client and CA certificate.
#ssl:
# certificate_authority: "/etc/pki/root/ca.pem"
# certificate: "/etc/pki/client/cert.pem"
# key: "/etc/pki/client/cert.key"
It has the following settings:
host
-
(Optional) Docker socket (UNIX or TCP socket). It uses
unix:///var/run/docker.sock
by default. match_fields
-
(Optional) A list of fields to match a container id, at least one of them should hold a container id to get the event enriched.
match_source
-
(Optional) Match container id from a log path present in
source
field. Enabled by default. match_source_index
-
(Optional) Index in the source path split by / to look for container id. It defaults to 4 to match
/var/lib/docker/containers/<container_id>/*.log
cleanup_timeout
-
(Optional) Time of inactivity to consider we can clean and forget metadata for a container, 60s by default.