-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fingerprint processor #14205
Fingerprint processor #14205
Conversation
type Method uint8 | ||
|
||
const ( | ||
MethodSHA1 Method = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported const MethodSHA1 should have comment (or a comment on this block) or be unexported
|
||
var errMethodUnknown = errors.New("unknown method") | ||
|
||
type Method uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported type Method should have comment or be unexported
This reminds me of https://github.com/andrewkroh/beats-processor-fingerprint. As well as #1872. |
Thanks for flagging these, @andrewkroh; I wasn't aware. @urso Looks like there's prior art (the processor in Andrew's repo) as well as discussions going on in the ES team (see elastic/elasticsearch#34085, which is eventually linked off #1872 and the related PR: elastic/elasticsearch#47047). Does it still make sense for me to continue working on this PR? |
I think yes, it makes sense. The one by Andrew is a private one. Its not to be found in Beats. Maybe we can take some of it. there is also the Logstash fingerprint processor. I hope the ES one will be similar to the Logstash one. But I didn't check yet. |
Per the discussion in elastic/elasticsearch#47047 (comment), we are going to resume working on a @andrewkroh Since you've already built a fingerprint processor in your repo, did you want to put up a PR with that code to the |
No, but please copy anything you want from my repo. 👍 |
bb1f75b
to
7912331
Compare
jenkins, test this |
} | ||
if err != nil { | ||
return "", errors.Wrapf(err, "failed when finding field [%v] in event", k) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have an option to ignore missing fields in case we have at least one field present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but is this the same as your suggestion in #14205 (comment) or something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost. My suggestion originally was to add support to ignore missing fields. But apparently we can have other error types as well. Would it make sense to treat those other types as 'missing' as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So if we can't "get" a field for whatever reason, we treat it as missing and then, if the missing_fields
option is set, ignore it. Hmm, I think this makes sense but let me just look into what other types of errors (besides common.ErrKeyNotFound
) might be returned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other error that can be returned here is if we try to get a value for a nested field (e.g. a.b.c
), but the ancestor path to the field (a.b
) does not resolve to a map
. To me this also feels like a missing field as, even in this case, we still could not find a.b.c
for the user. So I'm okay with collapsing the two error cases into one and adding ignore_missing
handling to it.
|
||
for _, test := range tests { | ||
name := fmt.Sprintf("testing %v encoding", test.encoding) | ||
t.Run(name, func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test names can be filtered using regexes. For this I like to give it some structure instead of having full names (assuming that the parent test its name is expressive enough). e.g. just pass encoding as name to t.Run
. (In case I have multiple parameters or want a name I use fmt.Sprintf("field=%v" , value)
).
In this case the test name could just be TestEncoding/base64
. I can run the test using go test -run Encoding/base64
(the /
acts as a delimiter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are not well isolated. A many tests have the same test event as input. It is okay to have a similar event, but the processor modifies the original map. A deep copy of the fields is required to guarantee some isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the deets look good. I just have a few minor comments.
|
||
.Fingerprint options | ||
[options="header"] | ||
|====== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[options="header"] | ||
|====== | ||
| `fields` | yes | | List of fields to use as the source for the fingerprint. | | ||
| `target_field` | no | `fingerprint` | Field in which the generated fingerprint should be stored. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables with more than 3 columns don't look very good in our HTML output (see screen capture in previous comment). Docbook swallows all table attributes, so we can't do anything about that right now. You could remove the example row and provide the example as part of the description. Or wait for some shift in the universe that will make this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the table with a definition list, like the one used in the Extract Array processor, for instance. Let me know what you think. Thanks!
@urso @dedemorton Thanks for your reviews. I believe I've addressed all your feedback now. This PR is ready for re-review, when you get a chance. Thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! In this case, I think the list is actually easier to scan.
b7b6890
to
e0a42ee
Compare
26169cf
to
5f577f4
Compare
jenkins, test this |
1 similar comment
jenkins, test this |
Travis CI is green. Jenkins CI is red because of |
* WIP: fingerprint processor * Implementing SHA256 fingerprinter * Sort source fields * Refactoring * Add TODO * Convert time fields to UTC * Removing unnecessary function * Adding SHA1 * WIP: add encoding * Cleanup * Running mage fmt * More methods + consolidating tests * Fleshing out tests * Adding test for target field * Adding documentation * Adding CHANGELOG entry * Fixing test * Converting tests to map * Isolating tests * Use io.Writer to stream in fields * Implement ignore_missing setting * Replace table with definition list * Adding `ignore_missing` to doc * using io.Fprintf * Use common.StringSet * Adding typed errors * Adding more typed errors * Adding license header
I am not sure where to move this issue forward, but it should be noted that the fingerprint processor for ingest processor creates inconsistent values when compared to using a hashing technique like md5, sha1, sha256, sha512 in any other software - includ Elastic software like logstash and filebeat. |
Resolves #11173.
This PR implements a
fingerprint
processor, similar to Logstash'sfingerprint
filter plugin.The processor will take the following configuration options:
fields
ignore_missing
false
target_field
fingerprint
method
sha256
md5
,sha1
,sha256
(default),sha384
,sha512
encoding
hex
hex
(default),base32
, orbase64