-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix splitting array of strings & array of arrays in httpjson input #30368
Conversation
This pull request does not have a backport label. Could you fix it @legoguy1000? 🙏
NOTE: |
Added tests for the Array of strings. Can't seem to get the Array of Array tests to work. |
My thought is to make that |
temp := make(map[string]interface{}) | ||
temp["data"] = t | ||
return common.MapStr(temp), true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp := make(map[string]interface{}) | |
temp["data"] = t | |
return common.MapStr(temp), true | |
return common.MapStr{"data": t}, true |
This pull request is now in conflicts. Could you fix it? 🙏
|
@@ -255,7 +255,7 @@ func toMapStr(v interface{}) (common.MapStr, bool) { | |||
return t, true | |||
case map[string]interface{}: | |||
return common.MapStr(t), true | |||
case string, []interface{}: | |||
case string, []bool, []int, []string, []interface{}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the slow response.
While this work for the existing tests, I'm not convinced for the general case. What other JSON array types might we see ([]map[string]interface{}
seems like a possibility)? and what do we do if the []interface{}
is an array of unhandleable types (do we care)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya it could be infinite 😬. I think though subarrays of maps/arrays may not need to be handled. If they end up wanting to split those it will just recursively handle that...I think. I would say try to cover the basics and if there are changes needed in the future, look at the situation then??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's fair, though it won't recursively handle it, that would require switching on kind and so on reflect. It might be worth putting a note here that this is where to look when this king of thing happens in the future.
In the mean time it's worth doing a survey (rough) of what types do end up in arrays. I would not be surprised if float64
might in some contexts for metrics (maybe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efd6 The original scenario from ticket #30345 includes an example that shows the columns with the corresponding types from _xpack/sql since the responses are similar to my use case. Perhaps being able to consume the types that Elastic maps to SQL should cover the most common types with a fallback to a string? It should still be on the engineer working a project to map fields so as long as the response is valid and consumed I think all would be good. I currently reassemble the documents by a making key/value pairs out of the columns per row, so consuming a source like this natively would be ideal. I just don't you to get stuck in the weeds of a specific scenario that inspired this ticket.
BTW, in my original source the columns
and rows
sections are called something else, so if this is actually what is pursued then having flexibility on those would be important.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wasserman. We should be able to get away with adding []map[string]interface{}
and []common.MapStr
to this type list. In order to catch cases that we weren't expecting, I'm wondering if this should not return an error explaining the type it's rejecting rather that a bool, fmt.Errorf("unexpected type for split: %T", t)
or similar.
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
This pull request is now in conflicts. Could you fix it? 🙏
|
OK. I have a fix for this. @legoguy1000 do you want to replay it here; it uses your tests. There are algebraic simplifications that look like they should work, but don't for reasons that I don't entirely understand. Diff against b832486 diff --git a/x-pack/filebeat/input/httpjson/split.go b/x-pack/filebeat/input/httpjson/split.go
index 3585fd9fef..714ba026be 100644
--- a/x-pack/filebeat/input/httpjson/split.go
+++ b/x-pack/filebeat/input/httpjson/split.go
@@ -144,14 +144,15 @@ func (s *split) split(ctx *transformContext, root mapstr.M, ch chan<- maybeMsg)
}
for _, e := range varr {
- if err := s.sendMessage(ctx, root, "", e, ch); err != nil {
+ err := s.sendMessage(ctx, root, s.targetInfo.Name, e, ch)
+ if err != nil {
s.log.Debug(err)
}
}
return nil
case splitTypeMap:
- vmap, ok := toMapStr(v)
+ vmap, ok := toMapStr(v, s.targetInfo.Name)
if !ok {
return errExpectedSplitObj
}
@@ -211,19 +212,17 @@ func (s *split) split(ctx *transformContext, root mapstr.M, ch chan<- maybeMsg)
// sendMessage sends an array or map split result value, v, on ch after performing
// any necessary transformations. If key is "", the value is an element of an array.
func (s *split) sendMessage(ctx *transformContext, root mapstr.M, key string, v interface{}, ch chan<- maybeMsg) error {
- obj, ok := toMapStr(v)
+ obj, ok := toMapStr(v, s.targetInfo.Name)
if !ok {
return errExpectedSplitObj
}
-
- clone := root.Clone()
-
if s.keyField != "" && key != "" {
_, _ = obj.Put(s.keyField, key)
}
+ clone := root.Clone()
if s.keepParent {
- _, _ = clone.Put(s.targetInfo.Name, obj)
+ _, _ = clone.Put(s.targetInfo.Name, v)
} else {
clone = obj
}
@@ -248,7 +247,7 @@ func (s *split) sendMessage(ctx *transformContext, root mapstr.M, key string, v
return nil
}
-func toMapStr(v interface{}) (mapstr.M, bool) {
+func toMapStr(v interface{}, key string) (mapstr.M, bool) {
if v == nil {
return mapstr.M{}, false
}
@@ -257,6 +256,8 @@ func toMapStr(v interface{}) (mapstr.M, bool) {
return t, true
case map[string]interface{}:
return mapstr.M(t), true
+ case string, []bool, []int, []string, []interface{}:
+ return mapstr.M{key: t}, true
}
return mapstr.M{}, false
}
diff --git a/x-pack/filebeat/input/httpjson/split_test.go b/x-pack/filebeat/input/httpjson/split_test.go
index b99220ff92..c8c8007697 100644
--- a/x-pack/filebeat/input/httpjson/split_test.go
+++ b/x-pack/filebeat/input/httpjson/split_test.go
@@ -624,6 +624,83 @@ func TestSplit(t *testing.T) {
{"@timestamp": "1234567890", "other_items": "Line 3"},
},
},
+ {
+ name: "Array of Strings with keep_parent",
+ config: &splitConfig{
+ Target: "body.alerts",
+ Type: "array",
+ KeepParent: true,
+ },
+ ctx: emptyTransformContext(),
+ resp: transformable{
+ "body": mapstr.M{
+ "this": "is kept",
+ "alerts": []interface{}{
+ "test1",
+ "test2",
+ "test3",
+ },
+ },
+ },
+ expectedMessages: []mapstr.M{
+ {
+ "this": "is kept",
+ "alerts": "test1",
+ },
+ {
+ "this": "is kept",
+ "alerts": "test2",
+ },
+ {
+ "this": "is kept",
+ "alerts": "test3",
+ },
+ },
+ expectedErr: nil,
+ },
+ {
+ name: "Array of Arrays with keep_parent",
+ config: &splitConfig{
+ Target: "body.alerts",
+ Type: "array",
+ KeepParent: true,
+ },
+ ctx: emptyTransformContext(),
+ resp: transformable{
+ "body": mapstr.M{
+ "this": "is kept",
+ "alerts": []interface{}{
+ []interface{}{"test1-1", "test1-2"},
+ []string{"test2-1", "test2-2"},
+ []int{1, 2},
+ },
+ },
+ },
+ expectedMessages: []mapstr.M{
+ {
+ "this": "is kept",
+ "alerts": []interface{}{
+ "test1-1",
+ "test1-2",
+ },
+ },
+ {
+ "this": "is kept",
+ "alerts": []string{
+ "test2-1",
+ "test2-2",
+ },
+ },
+ {
+ "this": "is kept",
+ "alerts": []int{
+ 1,
+ 2,
+ },
+ },
+ },
+ expectedErr: nil,
+ },
}
for _, tc := range cases { |
What does this PR do?
Allows HTTPJSON input to split Arrays of Strings and Arrays
Why is it important?
Currently can't
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs