Fingerprint processor #14205

ycombinator · 2019-10-22T23:51:06Z

Resolves #11173.

This PR implements a fingerprint processor, similar to Logstash's fingerprint filter plugin.

The processor will take the following configuration options:

Name	Required?	Default	Description
`fields`	Yes		List of fields to use as the source of the fingerprint
`ignore_missing`	No	`false`	Whether to ignore missing fields
`target_field`	No	`fingerprint`	Field in which the computed fingerprint should be stored
`method`	No	`sha256`	Algorithm to use for computing the fingerprint. Must be one of: `md5`, `sha1`, `sha256` (default), `sha384`, `sha512`
`encoding`	No	`hex`	Encoding to use on the fingerprint value. Must be one of `hex` (default), `base32`, or `base64`

houndci-bot · 2019-10-22T23:51:08Z

libbeat/processors/fingerprint/method.go

+type Method uint8
+
+const (
+	MethodSHA1 Method = iota


exported const MethodSHA1 should have comment (or a comment on this block) or be unexported

houndci-bot · 2019-10-22T23:51:09Z

libbeat/processors/fingerprint/method.go

+
+var errMethodUnknown = errors.New("unknown method")
+
+type Method uint8


exported type Method should have comment or be unexported

andrewkroh · 2019-10-23T02:22:49Z

This reminds me of https://github.com/andrewkroh/beats-processor-fingerprint. As well as #1872.

ycombinator · 2019-10-23T05:15:22Z

This reminds me of https://github.com/andrewkroh/beats-processor-fingerprint. As well as #1872.

Thanks for flagging these, @andrewkroh; I wasn't aware.

@urso Looks like there's prior art (the processor in Andrew's repo) as well as discussions going on in the ES team (see elastic/elasticsearch#34085, which is eventually linked off #1872 and the related PR: elastic/elasticsearch#47047). Does it still make sense for me to continue working on this PR?

urso · 2019-10-23T12:37:47Z

Does it still make sense for me to continue working on this PR?

I think yes, it makes sense.

The one by Andrew is a private one. Its not to be found in Beats. Maybe we can take some of it.

there is also the Logstash fingerprint processor. I hope the ES one will be similar to the Logstash one. But I didn't check yet.

ycombinator · 2019-10-25T19:06:05Z

Per the discussion in elastic/elasticsearch#47047 (comment), we are going to resume working on a fingerprint processor in Beats.

@andrewkroh Since you've already built a fingerprint processor in your repo, did you want to put up a PR with that code to the beats repo? If not, I'll continue working on this PR while looking at your work.

andrewkroh · 2019-10-28T03:17:24Z

did you want to put up a PR with that code to the beats repo? If not, I'll continue working on this PR while looking at your work.

No, but please copy anything you want from my repo. 👍

ycombinator · 2019-10-30T20:03:35Z

jenkins, test this

libbeat/processors/fingerprint/config.go

libbeat/processors/fingerprint/fingerprint.go

urso · 2019-10-30T22:20:17Z

libbeat/processors/fingerprint/fingerprint.go

+		}
+		if err != nil {
+			return "", errors.Wrapf(err, "failed when finding field [%v] in event", k)
+		}


Do we want to have an option to ignore missing fields in case we have at least one field present?

Sorry, but is this the same as your suggestion in #14205 (comment) or something different?

almost. My suggestion originally was to add support to ignore missing fields. But apparently we can have other error types as well. Would it make sense to treat those other types as 'missing' as well?

I see. So if we can't "get" a field for whatever reason, we treat it as missing and then, if the missing_fields option is set, ignore it. Hmm, I think this makes sense but let me just look into what other types of errors (besides common.ErrKeyNotFound) might be returned here.

The only other error that can be returned here is if we try to get a value for a nested field (e.g. a.b.c), but the ancestor path to the field (a.b) does not resolve to a map. To me this also feels like a missing field as, even in this case, we still could not find a.b.c for the user. So I'm okay with collapsing the two error cases into one and adding ignore_missing handling to it.

libbeat/processors/fingerprint/fingerprint_test.go

urso · 2019-10-30T22:29:23Z

libbeat/processors/fingerprint/fingerprint_test.go

+
+	for _, test := range tests {
+		name := fmt.Sprintf("testing %v encoding", test.encoding)
+		t.Run(name, func(t *testing.T) {


Test names can be filtered using regexes. For this I like to give it some structure instead of having full names (assuming that the parent test its name is expressive enough). e.g. just pass encoding as name to t.Run. (In case I have multiple parameters or want a name I use fmt.Sprintf("field=%v" , value)).

In this case the test name could just be TestEncoding/base64. I can run the test using go test -run Encoding/base64 (the / acts as a delimiter).

urso

The tests are not well isolated. A many tests have the same test event as input. It is okay to have a similar event, but the processor modifies the original map. A deep copy of the fields is required to guarantee some isolation.

dedemorton

All the deets look good. I just have a few minor comments.

dedemorton · 2019-10-31T00:00:08Z

libbeat/docs/processors-using.asciidoc

+
+.Fingerprint options
+[options="header"]
+|======


You forgot the header row here. :-) The rendered table looks like this:

dedemorton · 2019-10-31T00:05:07Z

libbeat/docs/processors-using.asciidoc

+[options="header"]
+|======
+| `fields`          | yes       |               | List of fields to use as the source for the fingerprint.                                                               |
+| `target_field`    | no        | `fingerprint` | Field in which the generated fingerprint should be stored.                                                             |


Tables with more than 3 columns don't look very good in our HTML output (see screen capture in previous comment). Docbook swallows all table attributes, so we can't do anything about that right now. You could remove the example row and provide the example as part of the description. Or wait for some shift in the universe that will make this right.

I replaced the table with a definition list, like the one used in the Extract Array processor, for instance. Let me know what you think. Thanks!

ycombinator · 2019-10-31T00:31:14Z

@urso @dedemorton Thanks for your reviews. I believe I've addressed all your feedback now. This PR is ready for re-review, when you get a chance. Thanks again!

dedemorton

LGTM! In this case, I think the list is actually easier to scan.

ycombinator · 2019-10-31T11:42:32Z

jenkins, test this

urso · 2019-10-31T15:41:36Z

libbeat/processors/fingerprint/fingerprint.go

+		i := v
+		switch vv := v.(type) {
+		case map[string]interface{}, []interface{}, common.MapStr:
+			return errors.Errorf("cannot compute fingerprint using non-scalar field [%v]", k)


would it make sense to apply 'IgnoreMissing' here as well?

I'm not sure about this one. This case would be reached if the user specified a field that resolved to a non-scalar value. So the field would not actually be missing in this case.

I see this one as more of an error on the user's part with maybe making a mistake in specifying the field so I think we should tell the user about it via an error.

WDYT?

I'm fine with that.

ycombinator · 2019-11-01T16:58:21Z

jenkins, test this

ycombinator · 2019-11-01T19:07:26Z

jenkins, test this

ycombinator · 2019-11-01T20:49:55Z

Travis CI is green. Jenkins CI is red because of x-pack/agent builds being yellow, which is unrelated to this PR. Merging.

* WIP: fingerprint processor * Implementing SHA256 fingerprinter * Sort source fields * Refactoring * Add TODO * Convert time fields to UTC * Removing unnecessary function * Adding SHA1 * WIP: add encoding * Cleanup * Running mage fmt * More methods + consolidating tests * Fleshing out tests * Adding test for target field * Adding documentation * Adding CHANGELOG entry * Fixing test * Converting tests to map * Isolating tests * Use io.Writer to stream in fields * Implement ignore_missing setting * Replace table with definition list * Adding `ignore_missing` to doc * using io.Fprintf * Use common.StringSet * Adding typed errors * Adding more typed errors * Adding license header

neu5ron · 2024-06-05T17:49:39Z

I am not sure where to move this issue forward, but it should be noted that the fingerprint processor for ingest processor creates inconsistent values when compared to using a hashing technique like md5, sha1, sha256, sha512 in any other software - includ Elastic software like logstash and filebeat.
In the issue it states, that even when hashing a single value - the fingerprint processor adds a byte to the value and then creates the hash

houndci-bot reviewed Oct 22, 2019

View reviewed changes

ycombinator mentioned this pull request Oct 23, 2019

[WIP] Re-introduce hash processor elastic/elasticsearch#47047

Closed

ycombinator changed the title ~~WIP: fingerprint processor~~ fingerprint processor Oct 30, 2019

ycombinator changed the title ~~fingerprint processor~~ Fingerprint processor Oct 30, 2019

ycombinator marked this pull request as ready for review October 30, 2019 02:55

ycombinator added :Processors enhancement libbeat review v7.6.0 v8.0.0 labels Oct 30, 2019

ycombinator requested review from dedemorton and urso October 30, 2019 02:56

urso reviewed Oct 30, 2019

View reviewed changes

libbeat/processors/fingerprint/config.go Show resolved Hide resolved

urso reviewed Oct 30, 2019

View reviewed changes

libbeat/processors/fingerprint/fingerprint.go Show resolved Hide resolved

urso reviewed Oct 30, 2019

View reviewed changes

libbeat/processors/fingerprint/fingerprint_test.go Show resolved Hide resolved

urso reviewed Oct 30, 2019

View reviewed changes

urso suggested changes Oct 30, 2019

View reviewed changes

dedemorton suggested changes Oct 31, 2019

View reviewed changes

dedemorton approved these changes Oct 31, 2019

View reviewed changes

urso reviewed Oct 31, 2019

View reviewed changes

ycombinator added 20 commits November 1, 2019 08:07

WIP: add encoding

85ef943

Cleanup

8cda4be

Running mage fmt

3c75d3b

More methods + consolidating tests

da29e8d

Fleshing out tests

52e5110

Adding test for target field

92af70c

Adding documentation

b1981e8

Adding CHANGELOG entry

3a2825c

Fixing test

9a0be57

Converting tests to map

b2ecab6

Isolating tests

217e318

Use io.Writer to stream in fields

6479e84

Implement ignore_missing setting

661f891

Replace table with definition list

ce27088

Adding ignore_missing to doc

2d17110

using io.Fprintf

ba390ad

Use common.StringSet

de73d0d

Adding typed errors

a6be2ab

Adding more typed errors

c65da9a

Adding license header

5f577f4

ycombinator merged commit 7e06580 into elastic:master Nov 1, 2019

urso added the Team:Beats label Nov 14, 2019

ycombinator deleted the lb-processor-fingerprint branch December 25, 2019 11:09

ycombinator mentioned this pull request Jan 17, 2020

Adds missing imports #15624

Merged

andresrc added the Team:Integrations Label for the Integrations team label Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fingerprint processor #14205

Fingerprint processor #14205

ycombinator commented Oct 22, 2019 •

edited

Loading

houndci-bot Oct 22, 2019

houndci-bot Oct 22, 2019

andrewkroh commented Oct 23, 2019

ycombinator commented Oct 23, 2019 •

edited

Loading

urso commented Oct 23, 2019

ycombinator commented Oct 25, 2019

andrewkroh commented Oct 28, 2019

ycombinator commented Oct 30, 2019

urso Oct 30, 2019

ycombinator Oct 30, 2019

urso Oct 30, 2019

ycombinator Oct 30, 2019

ycombinator Oct 31, 2019

urso Oct 30, 2019 •

edited

Loading

urso left a comment

dedemorton left a comment

dedemorton Oct 31, 2019

dedemorton Oct 31, 2019

ycombinator Oct 31, 2019

ycombinator commented Oct 31, 2019

dedemorton left a comment

ycombinator commented Oct 31, 2019

urso Oct 31, 2019

ycombinator Oct 31, 2019

urso Oct 31, 2019

ycombinator commented Nov 1, 2019

ycombinator commented Nov 1, 2019

ycombinator commented Nov 1, 2019

neu5ron commented Jun 5, 2024 •

edited

Loading


		var errMethodUnknown = errors.New("unknown method")

		type Method uint8

Fingerprint processor #14205

Fingerprint processor #14205

Conversation

ycombinator commented Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewkroh commented Oct 23, 2019

ycombinator commented Oct 23, 2019 • edited Loading

urso commented Oct 23, 2019

ycombinator commented Oct 25, 2019

andrewkroh commented Oct 28, 2019

ycombinator commented Oct 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

urso left a comment

Choose a reason for hiding this comment

dedemorton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Oct 31, 2019

dedemorton left a comment

Choose a reason for hiding this comment

ycombinator commented Oct 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Nov 1, 2019

ycombinator commented Nov 1, 2019

ycombinator commented Nov 1, 2019

neu5ron commented Jun 5, 2024 • edited Loading

ycombinator commented Oct 22, 2019 •

edited

Loading

ycombinator commented Oct 23, 2019 •

edited

Loading

urso Oct 30, 2019 •

edited

Loading

neu5ron commented Jun 5, 2024 •

edited

Loading