-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat][httpjson] Add split_events_by config setting #19246
[Filebeat][httpjson] Add split_events_by config setting #19246
Conversation
Pinging @elastic/siem (Team:SIEM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small comments from my side, rest looks LGTM, will test it once we have a solution that all agrees on.
Though will surely need more than me to review.
@@ -528,3 +537,70 @@ func TestOAuth2(t *testing.T) { | |||
} | |||
}) | |||
} | |||
|
|||
func TestSplitResponseWithKey(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is not really my field of expertise, but since we have one with key, one without, what happened if a key is empty? And if we only have one object in the array? I am unsure if that needs coverage or not, but just wanted to mention it.
ok := in.outlet.OnEvent(makeEvent(string(d))) | ||
if !ok { | ||
return nil, errors.New("function OnEvent returned false") | ||
for _, e := range in.splitEvent(v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a comment about our previous discussion, maybe @andrewkroh can pitch in on this, if we want splitEvent to happen on every single event even when splitEvent is not configured, and handle the config checks in the splitEvent method, or do it here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe renaming from splitEvent
to eventAsList
or similar makes for a better name here? wanted to avoid having two branches to handle the single event and the list inside the same case, but if it makes for a better understanding I am up for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think splitEvent makes more sense because it's the term that has been used over time regarding logstash, then it's easier to think "yeah split filters, just like splitEvent" :D
In terms of having to branches, the only reason I comment is that based on my personal taste, I don't like to run functions that is made for a specific usecase if the conditions are not met.
When reading the code now, it seems more that splitEvent should always happen, until you scroll further down and see that it handles another if condition.
Let's wait and see if we can get another comment before making a decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the s3 input has "expand_event_list_from_field" which does something very similar, might be worth comparing implementation.
ff76a25
to
cddb0f3
Compare
I used a copy of this PR together with the new ATP module, it seems to be splitting on both arrays correctly. The document had multiple events as an array under "values" key, and in each event there was also an "evidence" array of objects. One API response with 2 objects under value, and each has 2 objects under evidence returned 4 documents, one for each evidence corretly. |
return []map[string]interface{}{event} | ||
} | ||
|
||
mm := m.Clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that Clone() is not needed here since the field always gets overridden below. so using m should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't m storing the JSON object while mm is storing the array inside? mm is creating a shallow copy, to ensure that m is not modified while iterating over it here:
for _, split := range splitOn {
Will it be a good idea to create a split_event Filebeat processor instead of putting this just in the httpjson input, so that this functionality can be supported for other inputs also? |
Do you have any examples on which other inputs might benefit from this? I think in theory it makes sense but maybe at a later date? The reason is that I feel if splitting is currently only working on json structure then the input should enforce the structure as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I agree with @P1llus, but in case we decide to do it as a separate processor, I am unsure about how to do it. The processor interface looks like: type Processor interface {
String() string // print full processor description
Run(in *Event) (event *Event, err error)
} And for this use case we would need something like: type Processor interface {
String() string // print full processor description
Run(in *Event) ([]event *Event, err error)
} I am not sure if there is any other way of doing this that is not as a Processor, or to embed the list of resulting events in a single one, but that feels a bit odd to me if it is the case. WDYT? |
cddb0f3
to
fcecdce
Compare
After you mentioned it I do remember now that I have looked at split in processors before, and currently a processor cannot create multiple documents, as it will always run once for each incoming event. |
1e17a3c
to
7110fff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. One consideration to make is that the split filter in Logstash also allows for strings.
The field being split can either be a string or an array. (https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html)
But since this isn't a generalized processor it doesn't need to have the same features. Plus we can add the ability to work on a []string if needed in the future.
7110fff
to
c97f6e6
Compare
…ne-beats * upstream/master: (105 commits) ci: enable packaging job (elastic#19536) ci: disable upstream trigger on PRs for the packaging job (elastic#19490) Implement memlog on-disk handling (elastic#19408) fix go.mod for PR elastic#19423 (elastic#19521) [MetricBeat] add param `aws_partition` to support aws-cn, aws-us-gov regions (elastic#19423) Input v2 stateless manager (elastic#19406) Input v2 compatibility layer (elastic#19401) [Elastic Agent] Fix artifact downloading to allow endpoint-security to be downloaded (elastic#19503) fix: ignore target changes on scans (elastic#19510) Add more helpers to pipeline/testing package (elastic#19405) Report dependencies in CSV format (elastic#19506) [Filebeat] Fix reference leak in TCP and Unix socket inputs (elastic#19459) Cursor input skeleton (elastic#19378) Add changelog. (elastic#19495) [DOC] Typo in Kerberos (elastic#19265) Remove accidentally commited unused NOTICE template (elastic#19485) [Elastic Agent] Support the install, control, and uninstall of Endpoint (elastic#19248) [Filebeat][httpjson] Add split_events_by config setting (elastic#19246) ci: disabling packaging job until we fix it (elastic#19481) Fix golang.org/x/tools to release1.13 (elastic#19478) ...
(cherry picked from commit ce3f505)
What does this PR do?
Adds a
split_events_by
setting to thehttpjson
input to allow similar mechanics as thesplit
filter for logstash.Why is it important?
There are many use cases where a list of elements is passed as an API response, and we want to create each of them to a single event.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.