Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix splitting array of strings & array of arrays in httpjson input #30368

Closed
wants to merge 3 commits into from

Conversation

legoguy1000
Copy link
Contributor

@legoguy1000 legoguy1000 commented Feb 11, 2022

What does this PR do?

Allows HTTPJSON input to split Arrays of Strings and Arrays

Why is it important?

Currently can't

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 11, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 11, 2022

This pull request does not have a backport label. Could you fix it @legoguy1000? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 11, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 11, 2022

❕ Build Aborted

The PR is not allowed to run in the CI yet

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-14T13:49:09.042+0000

  • Duration: 9 min 54 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@legoguy1000
Copy link
Contributor Author

Added tests for the Array of strings. Can't seem to get the Array of Array tests to work.

@legoguy1000
Copy link
Contributor Author

My thought is to make that data field dynamic or the other option is to redo the original function as to not need it.

Comment on lines +259 to +261
temp := make(map[string]interface{})
temp["data"] = t
return common.MapStr(temp), true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
temp := make(map[string]interface{})
temp["data"] = t
return common.MapStr(temp), true
return common.MapStr{"data": t}, true

@mergify
Copy link
Contributor

mergify bot commented Feb 17, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b httpjson-split-test upstream/httpjson-split-test
git merge upstream/main
git push upstream httpjson-split-test

@@ -255,7 +255,7 @@ func toMapStr(v interface{}) (common.MapStr, bool) {
return t, true
case map[string]interface{}:
return common.MapStr(t), true
case string, []interface{}:
case string, []bool, []int, []string, []interface{}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response.

While this work for the existing tests, I'm not convinced for the general case. What other JSON array types might we see ([]map[string]interface{} seems like a possibility)? and what do we do if the []interface{} is an array of unhandleable types (do we care)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it could be infinite 😬. I think though subarrays of maps/arrays may not need to be handled. If they end up wanting to split those it will just recursively handle that...I think. I would say try to cover the basics and if there are changes needed in the future, look at the situation then??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's fair, though it won't recursively handle it, that would require switching on kind and so on reflect. It might be worth putting a note here that this is where to look when this king of thing happens in the future.

In the mean time it's worth doing a survey (rough) of what types do end up in arrays. I would not be surprised if float64 might in some contexts for metrics (maybe).

Copy link

@wasserman wasserman Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efd6 The original scenario from ticket #30345 includes an example that shows the columns with the corresponding types from _xpack/sql since the responses are similar to my use case. Perhaps being able to consume the types that Elastic maps to SQL should cover the most common types with a fallback to a string? It should still be on the engineer working a project to map fields so as long as the response is valid and consumed I think all would be good. I currently reassemble the documents by a making key/value pairs out of the columns per row, so consuming a source like this natively would be ideal. I just don't you to get stuck in the weeds of a specific scenario that inspired this ticket.

BTW, in my original source the columns and rows sections are called something else, so if this is actually what is pursued then having flexibility on those would be important.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wasserman. We should be able to get away with adding []map[string]interface{} and []common.MapStr to this type list. In order to catch cases that we weren't expecting, I'm wondering if this should not return an error explaining the type it's rejecting rather that a bool, fmt.Errorf("unexpected type for split: %T", t) or similar.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 15, 2022
@andrewkroh andrewkroh requested a review from a team March 28, 2022 20:32
@mergify
Copy link
Contributor

mergify bot commented Sep 26, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b httpjson-split-test upstream/httpjson-split-test
git merge upstream/main
git push upstream httpjson-split-test

@narph narph assigned efd6 and unassigned legoguy1000 Oct 26, 2022
@efd6
Copy link
Contributor

efd6 commented Oct 31, 2022

OK. I have a fix for this. @legoguy1000 do you want to replay it here; it uses your tests. There are algebraic simplifications that look like they should work, but don't for reasons that I don't entirely understand.

Diff against b832486

diff --git a/x-pack/filebeat/input/httpjson/split.go b/x-pack/filebeat/input/httpjson/split.go
index 3585fd9fef..714ba026be 100644
--- a/x-pack/filebeat/input/httpjson/split.go
+++ b/x-pack/filebeat/input/httpjson/split.go
@@ -144,14 +144,15 @@ func (s *split) split(ctx *transformContext, root mapstr.M, ch chan<- maybeMsg)
                }
 
                for _, e := range varr {
-                       if err := s.sendMessage(ctx, root, "", e, ch); err != nil {
+                       err := s.sendMessage(ctx, root, s.targetInfo.Name, e, ch)
+                       if err != nil {
                                s.log.Debug(err)
                        }
                }
 
                return nil
        case splitTypeMap:
-               vmap, ok := toMapStr(v)
+               vmap, ok := toMapStr(v, s.targetInfo.Name)
                if !ok {
                        return errExpectedSplitObj
                }
@@ -211,19 +212,17 @@ func (s *split) split(ctx *transformContext, root mapstr.M, ch chan<- maybeMsg)
 // sendMessage sends an array or map split result value, v, on ch after performing
 // any necessary transformations. If key is "", the value is an element of an array.
 func (s *split) sendMessage(ctx *transformContext, root mapstr.M, key string, v interface{}, ch chan<- maybeMsg) error {
-       obj, ok := toMapStr(v)
+       obj, ok := toMapStr(v, s.targetInfo.Name)
        if !ok {
                return errExpectedSplitObj
        }
-
-       clone := root.Clone()
-
        if s.keyField != "" && key != "" {
                _, _ = obj.Put(s.keyField, key)
        }
 
+       clone := root.Clone()
        if s.keepParent {
-               _, _ = clone.Put(s.targetInfo.Name, obj)
+               _, _ = clone.Put(s.targetInfo.Name, v)
        } else {
                clone = obj
        }
@@ -248,7 +247,7 @@ func (s *split) sendMessage(ctx *transformContext, root mapstr.M, key string, v
        return nil
 }
 
-func toMapStr(v interface{}) (mapstr.M, bool) {
+func toMapStr(v interface{}, key string) (mapstr.M, bool) {
        if v == nil {
                return mapstr.M{}, false
        }
@@ -257,6 +256,8 @@ func toMapStr(v interface{}) (mapstr.M, bool) {
                return t, true
        case map[string]interface{}:
                return mapstr.M(t), true
+       case string, []bool, []int, []string, []interface{}:
+               return mapstr.M{key: t}, true
        }
        return mapstr.M{}, false
 }
diff --git a/x-pack/filebeat/input/httpjson/split_test.go b/x-pack/filebeat/input/httpjson/split_test.go
index b99220ff92..c8c8007697 100644
--- a/x-pack/filebeat/input/httpjson/split_test.go
+++ b/x-pack/filebeat/input/httpjson/split_test.go
@@ -624,6 +624,83 @@ func TestSplit(t *testing.T) {
                                {"@timestamp": "1234567890", "other_items": "Line 3"},
                        },
                },
+               {
+                       name: "Array of Strings with keep_parent",
+                       config: &splitConfig{
+                               Target:     "body.alerts",
+                               Type:       "array",
+                               KeepParent: true,
+                       },
+                       ctx: emptyTransformContext(),
+                       resp: transformable{
+                               "body": mapstr.M{
+                                       "this": "is kept",
+                                       "alerts": []interface{}{
+                                               "test1",
+                                               "test2",
+                                               "test3",
+                                       },
+                               },
+                       },
+                       expectedMessages: []mapstr.M{
+                               {
+                                       "this":   "is kept",
+                                       "alerts": "test1",
+                               },
+                               {
+                                       "this":   "is kept",
+                                       "alerts": "test2",
+                               },
+                               {
+                                       "this":   "is kept",
+                                       "alerts": "test3",
+                               },
+                       },
+                       expectedErr: nil,
+               },
+               {
+                       name: "Array of Arrays with keep_parent",
+                       config: &splitConfig{
+                               Target:     "body.alerts",
+                               Type:       "array",
+                               KeepParent: true,
+                       },
+                       ctx: emptyTransformContext(),
+                       resp: transformable{
+                               "body": mapstr.M{
+                                       "this": "is kept",
+                                       "alerts": []interface{}{
+                                               []interface{}{"test1-1", "test1-2"},
+                                               []string{"test2-1", "test2-2"},
+                                               []int{1, 2},
+                                       },
+                               },
+                       },
+                       expectedMessages: []mapstr.M{
+                               {
+                                       "this": "is kept",
+                                       "alerts": []interface{}{
+                                               "test1-1",
+                                               "test1-2",
+                                       },
+                               },
+                               {
+                                       "this": "is kept",
+                                       "alerts": []string{
+                                               "test2-1",
+                                               "test2-2",
+                                       },
+                               },
+                               {
+                                       "this": "is kept",
+                                       "alerts": []int{
+                                               1,
+                                               2,
+                                       },
+                               },
+                       },
+                       expectedErr: nil,
+               },
        }
 
        for _, tc := range cases {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Filebeat Filebeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filebeat httpjson input response.split fails on an array of arrays
6 participants