Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.7] Deep merge event fields and metadata maps (#17958) #18231

Merged
merged 5 commits into from
May 5, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Do not rotate log files on startup when interval is configured and rotateonstartup is disabled. {pull}17613[17613]
- Fix `setup.dashboards.index` setting not working. {pull}17749[17749]
- Fix Elasticsearch license endpoint URL referenced in error message. {issue}17880[17880] {pull}18030[18030]
- Fix panic when assigning a key to a `nil` value in an event. {pull}18143[18143]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changelog didn't make it :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching. Fixed in d4d8191.

While fixing this I noticed that there are duplicate entries for the same PR in the CHANGELOG, even in the original PR to master. So I've cleaned those up in this backport. I'll make a separate PR to master to clean up the CHANGELOG there.

Copy link
Contributor Author

@ycombinator ycombinator May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to cleanup CHANGELOG in master: #18233

- Gives monitoring reporter hosts, if configured, total precedence over corresponding output hosts. {issue}17937[17937] {pull}17991[17991]
- Arbitrary fields and metadata maps are now deep merged into event. {pull}17958[17958]
- Change `decode_json_fields` processor, to merge parsed json objects with existing objects in the event instead of fully replacing them. {pull}17958[17958]

*Auditbeat*

Expand Down Expand Up @@ -192,6 +196,42 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Add `providers` setting to `add_cloud_metadata` processor. {pull}13812[13812]
- Ensure that init containers are no longer tailed after they stop {pull}14394[14394]
- Fingerprint processor adds a new xxhash hashing algorithm {pull}15418[15418]
- Add document_id setting to decode_json_fields processor. {pull}15859[15859]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ycombinator ! Is all this normal however?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's definitely not! 😬 Not sure what happened here, fixing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I have this finally in the right state. What you see here should be a backport of https://github.com/elastic/beats/pull/17958/files and https://github.com/elastic/beats/pull/18233/files.

- Include network information by default on add_host_metadata and add_observer_metadata. {issue}15347[15347] {pull}16077[16077]
- Add `aws_ec2` provider for autodiscover. {issue}12518[12518] {pull}14823[14823]
- Add monitoring variable `libbeat.config.scans` to distinguish scans of the configuration directory from actual reloads of its contents. {pull}16440[16440]
- Add support for multiple password in redis output. {issue}16058[16058] {pull}16206[16206]
- Add support for Histogram type in fields.yml {pull}16570[16570]
- Windows .exe files now have embedded file version info. {issue}15232[15232]t
- Remove experimental flag from `setup.template.append_fields` {pull}16576[16576]
- Add `add_cloudfoundry_metadata` processor to annotate events with Cloud Foundry application data. {pull}16621[16621]
- Add Kerberos support to Kafka input and output. {pull}16781[16781]
- Add `add_cloudfoundry_metadata` processor to annotate events with Cloud Foundry application data. {pull}16621[16621
- Add support for kubernetes provider to recognize namespace level defaults {pull}16321[16321]
- Add `translate_sid` processor on Windows for converting Windows security identifier (SID) values to names. {issue}7451[7451] {pull}16013[16013]
- Add capability of enrich `container.id` with process id in `add_process_metadata` processor {pull}15947[15947]
- Update RPM packages contained in Beat Docker images. {issue}17035[17035]
- Update supported versions of `redis` output. {pull}17198[17198]
- Update documentation for system.process.memory fields to include clarification on Windows os's. {pull}17268[17268]
- Add `replace` processor for replacing string values of fields. {pull}17342[17342]
- Add optional regex based cid extractor to `add_kubernetes_metadata` processor. {pull}17360[17360]
- Add `urldecode` processor to for decoding URL-encoded fields. {pull}17505[17505]
- Add support for AWS IAM `role_arn` in credentials config. {pull}17658[17658] {issue}12464[12464]
- Add keystore support for autodiscover static configurations. {pull]16306[16306]
- Add Kerberos support to Elasticsearch output. {pull}17927[17927]
- Add support for fixed length extraction in `dissect` processor. {pull}17191[17191]
- Set `agent.name` to the hostname by default. {issue}16377[16377] {pull}18000[18000]
- Add config example of how to skip the `add_host_metadata` processor when forwarding logs. {issue}13920[13920] {pull}18153[18153]
- When using the `decode_json_fields` processor, decoded fields are now deep-merged into existing event. {pull}17958[17958]

*Auditbeat*

- Reference kubernetes manifests include configuration for auditd and enrichment with kubernetes metadata. {pull}17431[17431]
- Reference kubernetes manifests mount data directory from the host, so data persist between executions in the same node. {pull}17429[17429]
- Log to stderr when running using reference kubernetes manifests. {pull}17443[174443]
- Fix syscall kprobe arguments for 32-bit systems in socket module. {pull}17500[17500]
- Fix memory leak on when we miss socket close kprobe events. {pull}17500[17500]
- Add system module process dataset ECS categorization fields. {pull}18032[18032]

*Filebeat*

Expand Down Expand Up @@ -250,6 +290,26 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Improve AWS cloudtrail field mappings {issue}16086[16086] {issue}16110[16110] {pull}17155[17155]
- Release Google Cloud module as GA. {pull}17511[17511]
- Update filebeat httpjson input to support pagination via Header and Okta module. {pull}16354[16354]
- Add config option to select a different azure cloud env in the azure-eventhub input and azure module. {issue}17649[17649] {pull}17659[17659]
- Added new Checkpoint Syslog filebeat module. {pull}17682[17682]
- Improve ECS categorization field mappings for nats module. {issue}16173[16173] {pull}17550[17550]
- Enhance `elasticsearch/server` fileset to handle ECS-compatible logs emitted by Elasticsearch. {issue}17715[17715] {pull}17714[17714]
- Add support for Google Application Default Credentials to the Google Pub/Sub input and Google Cloud modules. {pull}15668[15668]
- Enhance `elasticsearch/deprecation` fileset to handle ECS-compatible logs emitted by Elasticsearch. {issue}17715[17715] {pull}17728[17728]
- Enhance `elasticsearch/slowlog` fileset to handle ECS-compatible logs emitted by Elasticsearch. {issue}17715[17715] {pull}17729[17729]
- Improve ECS categorization field mappings in misp module. {issue}16026[16026] {pull}17344[17344]
- Added Unix stream socket support as an input source and a syslog input source. {pull}17492[17492]
- Improve ECS categorization field mappings in postgresql module. {issue}16177[16177] {pull}17914[17914]
- Improve ECS categorization field mappings in rabbitmq module. {issue}16178[16178] {pull}17916[17916]
- Make `decode_cef` processor GA. {pull}17944[17944]
- Improve ECS categorization field mappings in redis module. {issue}16179[16179] {pull}17918[17918]
- Improve ECS categorization field mappings for zeek module. {issue}16029[16029] {pull}17738[17738]
- Improve ECS categorization field mappings for netflow module. {issue}16135[16135] {pull}18108[18108]
- Added an input option `publisher_pipeline.disable_host` to disable `host.name`
from being added to events by default. {pull}18159[18159]
- Improve ECS categorization field mappings in system module. {issue}16031[16031] {pull}18065[18065]
- When using the `json.*` setting available on some inputs, decoded fields are now deep-merged into existing event. {pull}17958[17958]
- Change the `json.*` input settings implementation to merge parsed json objects with existing objects in the event instead of fully replacing them. {pull}17958[17958]

*Heartbeat*

Expand Down
27 changes: 18 additions & 9 deletions libbeat/common/jsontransform/jsonhelper.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ import (
// WriteJSONKeys writes the json keys to the given event based on the overwriteKeys option and the addErrKey
func WriteJSONKeys(event *beat.Event, keys map[string]interface{}, overwriteKeys bool, addErrKey bool) {
if !overwriteKeys {
for k, v := range keys {
if _, exists := event.Fields[k]; !exists && k != "@timestamp" && k != "@metadata" {
event.Fields[k] = v
}
}
// @timestamp and @metadata fields are root-level fields. We remove them so they
// don't become part of event.Fields.
removeKeys(keys, "@timestamp", "@metadata")

// Then, perform deep update without overwriting
event.Fields.DeepUpdateNoOverwrite(keys)
return
}

Expand Down Expand Up @@ -64,7 +65,7 @@ func WriteJSONKeys(event *beat.Event, keys map[string]interface{}, overwriteKeys
}

case map[string]interface{}:
event.Meta.Update(common.MapStr(m))
event.Meta.DeepUpdate(common.MapStr(m))

default:
event.SetErrorWithOption(createJSONError("failed to update @metadata"), addErrKey)
Expand All @@ -83,13 +84,21 @@ func WriteJSONKeys(event *beat.Event, keys map[string]interface{}, overwriteKeys
continue
}
event.Fields[k] = vstr

default:
event.Fields[k] = v
}
}

// We have accounted for @timestamp, @metadata, type above. So let's remove these keys and
// deep update the event with the rest of the keys.
removeKeys(keys, "@timestamp", "@metadata", "type")
event.Fields.DeepUpdate(keys)
}

func createJSONError(message string) common.MapStr {
return common.MapStr{"message": message, "type": "json"}
}

func removeKeys(keys map[string]interface{}, names ...string) {
for _, name := range names {
delete(keys, name)
}
}
136 changes: 136 additions & 0 deletions libbeat/common/jsontransform/jsonhelper_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package jsontransform

import (
"testing"
"time"

"github.com/stretchr/testify/require"

"github.com/elastic/beats/v7/libbeat/beat"
"github.com/elastic/beats/v7/libbeat/common"
)

func TestWriteJSONKeys(t *testing.T) {
now := time.Now()
now = now.Round(time.Second)

eventTimestamp := time.Date(2020, 01, 01, 01, 01, 00, 0, time.UTC)
eventMetadata := common.MapStr{
"foo": "bar",
"baz": common.MapStr{
"qux": 17,
},
}
eventFields := common.MapStr{
"top_a": 23,
"top_b": common.MapStr{
"inner_c": "see",
"inner_d": "dee",
},
}

tests := map[string]struct {
keys map[string]interface{}
overwriteKeys bool
expectedMetadata common.MapStr
expectedTimestamp time.Time
expectedFields common.MapStr
}{
"overwrite_true": {
overwriteKeys: true,
keys: map[string]interface{}{
"@metadata": map[string]interface{}{
"foo": "NEW_bar",
"baz": map[string]interface{}{
"qux": "NEW_qux",
"durrr": "COMPLETELY_NEW",
},
},
"@timestamp": now.Format(time.RFC3339),
"top_b": map[string]interface{}{
"inner_d": "NEW_dee",
"inner_e": "COMPLETELY_NEW_e",
},
"top_c": "COMPLETELY_NEW_c",
},
expectedMetadata: common.MapStr{
"foo": "NEW_bar",
"baz": common.MapStr{
"qux": "NEW_qux",
"durrr": "COMPLETELY_NEW",
},
},
expectedTimestamp: now,
expectedFields: common.MapStr{
"top_a": 23,
"top_b": common.MapStr{
"inner_c": "see",
"inner_d": "NEW_dee",
"inner_e": "COMPLETELY_NEW_e",
},
"top_c": "COMPLETELY_NEW_c",
},
},
"overwrite_false": {
overwriteKeys: false,
keys: map[string]interface{}{
"@metadata": map[string]interface{}{
"foo": "NEW_bar",
"baz": map[string]interface{}{
"qux": "NEW_qux",
"durrr": "COMPLETELY_NEW",
},
},
"@timestamp": now.Format(time.RFC3339),
"top_b": map[string]interface{}{
"inner_d": "NEW_dee",
"inner_e": "COMPLETELY_NEW_e",
},
"top_c": "COMPLETELY_NEW_c",
},
expectedMetadata: eventMetadata.Clone(),
expectedTimestamp: eventTimestamp,
expectedFields: common.MapStr{
"top_a": 23,
"top_b": common.MapStr{
"inner_c": "see",
"inner_d": "dee",
"inner_e": "COMPLETELY_NEW_e",
},
"top_c": "COMPLETELY_NEW_c",
},
},
}

for name, test := range tests {
t.Run(name, func(t *testing.T) {
event := &beat.Event{
Timestamp: eventTimestamp,
Meta: eventMetadata.Clone(),
Fields: eventFields.Clone(),
}

WriteJSONKeys(event, test.keys, test.overwriteKeys, false)
require.Equal(t, test.expectedMetadata, event.Meta)
require.Equal(t, test.expectedTimestamp.UnixNano(), event.Timestamp.UnixNano())
require.Equal(t, test.expectedFields, event.Fields)
})
}
}