Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_endpoint: Allow receiving multiple documents on a single request #25764

Merged
merged 6 commits into from
Jun 7, 2021

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented May 18, 2021

What does this PR do?

Updates Filebeat's http_endpoint to support receiving multiple documents from a single POST request.

Until now it only accepted a single document (JSON object) per request.

With this PR:

  • If the body is an array of objects the input will emit each object separately.
  • If the body is in NDJSON format, the input will emit each object in the stream.

Why is it important?

Minimizes the number of requests for high-volume ingestion.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 18, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 18, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #25764 updated

  • Start Time: 2021-06-03T10:03:06.062+0000

  • Duration: 102 min 33 sec

  • Commit: 6e3e4ae

Test stats 🧪

Test Results
Failed 0
Passed 7311
Skipped 1193
Total 8504

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 7311
Skipped 1193
Total 8504

obj := bytes.TrimLeft(b, " \t\r\n")
if len(obj) > 0 && obj[0] == '{' {
return true
func httpReadNDJSON(body io.Reader) (objs []common.MapStr, status int, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading this, but could we just have one function? I think with json.NewDecoder we could handle:

{"a": 1}
or
{"a": 1}{"b":2}
or
{"a":1}
{"b":2}
or
[{"a":1},{"b":2}]

and then json or ndjson would both work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

func httpReadJSON(body io.Reader) (objs []common.MapStr, status int, err error) {
	if body == http.NoBody {
		return nil, http.StatusNotAcceptable, errBodyEmpty
	}

	decoder := json.NewDecoder(body)
	for idx := 0; ; idx++ {
		var obj interface{}
		if err := decoder.Decode(&obj); err != nil {
			if err == io.EOF {
				break
			}
			return nil, http.StatusBadRequest, errors.Wrapf(err, "malformed JSON object at stream position %d", idx)
		}
		switch v := obj.(type) {
		case map[string]interface{}:
			objs = append(objs, v)
		case []interface{}:
			for listIdx, listObj := range v {
				asMap, ok := listObj.(map[string]interface{})
				if !ok {
					return nil, http.StatusBadRequest, fmt.Errorf("%v at obj index %d, list index %d", errUnsupportedType, idx, listIdx)
				}
				objs = append(objs, asMap)
			}
		default:
			return nil, http.StatusBadRequest, errUnsupportedType
		}
	}
	return objs, 0, nil
}

I can push it to the PR if you want.

@mergify
Copy link
Contributor

mergify bot commented May 19, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b http_input_multi_doc upstream/http_input_multi_doc
git merge upstream/master
git push upstream http_input_multi_doc

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case 2 function approach is best fit.

x-pack/filebeat/input/http_endpoint/handler_test.go Outdated Show resolved Hide resolved
@adriansr adriansr added the backport-v7.14.0 Automated backport with mergify label May 25, 2021
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Updates Filebeat's http_endpoint to produce multiple documents from a
single POST request. This extends the application/json format handling
to accept arrays of objects, and adds support for the NDJSON format
(application/x-ndjson).
This uses a single parser that accepts both JSON and NDJSON
@adriansr
Copy link
Contributor Author

/test

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b http_input_multi_doc upstream/http_input_multi_doc
git merge upstream/master
git push upstream http_input_multi_doc

@adriansr adriansr merged commit 8bbb26f into elastic:master Jun 7, 2021
mergify bot pushed a commit that referenced this pull request Jun 7, 2021
…#25764)

Updates Filebeat's http_endpoint to produce multiple documents from a
single POST request. This extends the application/json format handling
to accept arrays of objects, and adds support for the NDJSON format
(application/x-ndjson).

(cherry picked from commit 8bbb26f)
adriansr added a commit that referenced this pull request Jun 9, 2021
…#25764) (#26175)

Updates Filebeat's http_endpoint to produce multiple documents from a
single POST request. This extends the application/json format handling
to accept arrays of objects, and adds support for the NDJSON format
(application/x-ndjson).

(cherry picked from commit 8bbb26f)

Co-authored-by: Adrian Serrano <adrisr83@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify enhancement review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants