Skip to content

Commit

Permalink
Add rename fields processor
Browse files Browse the repository at this point in the history
The rename processor allows to rename fields before they are indexed to standardise on names or move fields around. This becomes useful when building filebeat modules which read from json files. With the rename processor no ingest pipeline is needed to follow the naming schema. This should make some modules simpler to build. It's also useful in combination with elastic#6024 to rename some fields according to the schema.

```
processors:
- rename:
    fields:
     - from: "a"
       to: "b"
```

Intention of rename
* Adjust fields to mapping
* Prevent conflicts like `a` and `a.b` by renaming `a` to `a.value`

Limitations
* Will not overwrite keys
  • Loading branch information
ruflin committed Apr 10, 2018
1 parent 80e6f72 commit 0fe3c29
Show file tree
Hide file tree
Showing 12 changed files with 582 additions and 3 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ https://github.com/elastic/beats/compare/v6.0.0-beta2...master[Check the HEAD di
- Add support for spooling to disk to the beats event publishing pipeline. {pull}6581[6581]
- Added logging of system info at Beat startup. {issue}5946[5946]
- Do not log errors if X-Pack Monitoring is enabled but Elastisearch X-Pack is not. {pull}6627[6627]
- Add rename processor. {pull}6292[6292]

*Auditbeat*

Expand Down
8 changes: 8 additions & 0 deletions auditbeat/auditbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,14 @@ auditbeat.modules:
# equals:
# http.code: 200
#
# The following example renames the field a to b:
#
#processors:
#- rename:
# fields:
# - from: "a"
# to: "b"
#
# The following example enriches each event with metadata from the cloud
# provider about the host machine. It works on EC2, GCE, DigitalOcean,
# Tencent Cloud, and Alibaba Cloud.
Expand Down
8 changes: 8 additions & 0 deletions filebeat/filebeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,14 @@ filebeat.inputs:
# equals:
# http.code: 200
#
# The following example renames the field a to b:
#
#processors:
#- rename:
# fields:
# - from: "a"
# to: "b"
#
# The following example enriches each event with metadata from the cloud
# provider about the host machine. It works on EC2, GCE, DigitalOcean,
# Tencent Cloud, and Alibaba Cloud.
Expand Down
8 changes: 8 additions & 0 deletions heartbeat/heartbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,14 @@ heartbeat.scheduler:
# equals:
# http.code: 200
#
# The following example renames the field a to b:
#
#processors:
#- rename:
# fields:
# - from: "a"
# to: "b"
#
# The following example enriches each event with metadata from the cloud
# provider about the host machine. It works on EC2, GCE, DigitalOcean,
# Tencent Cloud, and Alibaba Cloud.
Expand Down
8 changes: 8 additions & 0 deletions libbeat/_meta/config.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,14 @@
# equals:
# http.code: 200
#
# The following example renames the field a to b:
#
#processors:
#- rename:
# fields:
# - from: "a"
# to: "b"
#
# The following example enriches each event with metadata from the cloud
# provider about the host machine. It works on EC2, GCE, DigitalOcean,
# Tencent Cloud, and Alibaba Cloud.
Expand Down
51 changes: 49 additions & 2 deletions libbeat/docs/processors-using.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ The supported processors are:
* <<drop-event,`drop_event`>>
* <<drop-fields,`drop_fields`>>
* <<include-fields,`include_fields`>>
* <<rename,`rename`>>
* <<add-kubernetes-metadata,`add_kubernetes_metadata`>>
* <<add-docker-metadata,`add_docker_metadata`>>
* <<add-host-metadata,`add_host_metadata`>>
Expand All @@ -54,8 +55,8 @@ Each condition receives a field to compare. You can specify multiple fields
under the same condition by using `AND` between the fields (for example,
`field1 AND field2`).

For each field, you can specify a simple field name or a nested map, for
example `dns.question.name`.
For each field, you can specify a simple field name or a nested map, for example
`dns.question.name`.

See <<exported-fields>> for a list of all the fields that are exported by
{beatname_uc}.
Expand Down Expand Up @@ -531,6 +532,52 @@ section.
NOTE: If you define an empty list of fields under `include_fields`, then only
the required fields, `@timestamp` and `type`, are exported.

[[rename-fields]]
=== Rename fields from events

The `rename` processor specifies a list of fields that should be renamed. The
list of fields to be renamed are listed under `fields`. Each entry contains a
`from: old-key` and a `to: new-key` pair. `from` is the origin and `to` the
target name of the field.

The renaming of fields can be useful in case two fields potentially conflict.

Renaming fields can be useful in cases where field names cause conflicts. For
example if an event has two fields, `a` and `a.b`, that are both assigned a
scalar values (e.g. `{"a": 1, "a.b": 2}`) this will result in an error at ingest
time from Elasticsearch. This is because the value of a cannot simultaneously be
a scalar and an object. To prevent this rename_fields can be used to rename a to
a.value.

Rename fields cannot be used to overwrite fields. To overwrite fields either
first rename the target field or use the drop_fields processor to drop the field
and then rename the field.

[source,yaml]
-------
processors:
- rename:
fields:
- from: "a.g"
to: "e.d"
ignore_missing: false
fail_on_error: true
-------

The `rename` processor has the following configuration settings:

`ignore_missing`:: (Optional) If set to true, no error is logged in case a key
which should be renamed is missing. Default is `false`.

`fail_on_error`:: (Optional) If set to true, in case of an error the renaming of
fields is stopped and the original event is returned. If set to false, renaming
continues also if an error happened during renaming. Default is `true`.

See <<conditions>> for a list of supported conditions.

You can specify multiple `ignore_missing` processors under the `processors`
section.

[[add-kubernetes-metadata]]
=== Add Kubernetes metadata

Expand Down
104 changes: 104 additions & 0 deletions libbeat/processors/actions/rename.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
package actions

import (
"fmt"

"github.com/pkg/errors"

"github.com/elastic/beats/libbeat/beat"
"github.com/elastic/beats/libbeat/common"
"github.com/elastic/beats/libbeat/common/cfgwarn"
"github.com/elastic/beats/libbeat/logp"
"github.com/elastic/beats/libbeat/processors"
)

type renameFields struct {
config renameFieldsConfig
}

type renameFieldsConfig struct {
Fields []fromTo `config:"fields"`
IgnoreMissing bool `config:"ignore_missing"`
FailOnError bool `config:"fail_on_error"`
}

type fromTo struct {
From string `config:"from"`
To string `config:"to"`
}

func init() {
processors.RegisterPlugin("rename",
configChecked(newRenameFields,
requireFields("fields")))
}

func newRenameFields(c *common.Config) (processors.Processor, error) {

cfgwarn.Beta("Beta rename processor is used.")
config := renameFieldsConfig{
IgnoreMissing: false,
FailOnError: true,
}
err := c.Unpack(&config)
if err != nil {
return nil, fmt.Errorf("failed to unpack the rename configuration: %s", err)
}

f := &renameFields{
config: config,
}
return f, nil
}

func (f *renameFields) Run(event *beat.Event) (*beat.Event, error) {
var backup common.MapStr
// Creates a copy of the event to revert in case of failure
if f.config.FailOnError {
backup = event.Fields.Clone()
}

for _, field := range f.config.Fields {
err := f.renameField(field.From, field.To, event.Fields)
if err != nil && f.config.FailOnError {
logp.Debug("rename", "Failed to rename fields, revert to old event: %s", err)
event.Fields = backup
return event, err
}
}

return event, nil
}

func (f *renameFields) renameField(from string, to string, fields common.MapStr) error {
// Fields cannot be overwritten. Either the target field has to be dropped first or renamed first
exists, _ := fields.HasKey(to)
if exists {
return fmt.Errorf("target field %s already exists, drop or rename this field first", to)
}

value, err := fields.GetValue(from)
if err != nil {
// Ignore ErrKeyNotFound errors
if f.config.IgnoreMissing && errors.Cause(err) == common.ErrKeyNotFound {
return nil
}
return fmt.Errorf("could not fetch value for key: %s, Error: %s", to, err)
}

// Deletion must happen first to support cases where a becomes a.b
err = fields.Delete(from)
if err != nil {
return fmt.Errorf("could not delete key: %s, %+v", from, err)
}

_, err = fields.Put(to, value)
if err != nil {
return fmt.Errorf("could not put value: %s: %v, %+v", to, value, err)
}
return nil
}

func (f *renameFields) String() string {
return "rename=" + fmt.Sprintf("%+v", f.config.Fields)
}
Loading

0 comments on commit 0fe3c29

Please sign in to comment.