Make event acknowledgment asynchronous in shipper output #32785

rdner · 2022-08-23T16:56:15Z

What does this PR do?

Implements an asynchronous approach for acknowledgment of event batches replacing the previous blocking approach.

Why is it important?

So, the event pipeline is not blocked because of a single batch and keeps publishing events to the shipper.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~- [ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
~~- [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

Related issues

So we can keep publishing batches not blocking on a single batch to be acknowledged.

mergify · 2022-08-23T16:56:25Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @rdner? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

libbeat/outputs/shipper/api/shipper_mock.go

libbeat/outputs/shipper/shipper.go

elasticmachine · 2022-08-23T17:07:13Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine · 2022-08-23T18:19:45Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-08-30T09:39:20.960+0000
Duration: 86 min 16 sec

Test stats 🧪

Test	Results
Failed	0
Passed	22747
Skipped	1947
Total	24694

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

libbeat/outputs/shipper/shipper.go

faec · 2022-08-24T20:53:03Z

libbeat/outputs/shipper/shipper.go

+			s.pendingMutex.Lock()
+			lastProcessed := 0
+			for _, p := range s.pending {
+				if p.serverID != indexReply.Uuid {


Going along with the initialization comment above: the cleanup from a mismatched uuid should happen in Publish and/or Close -- the uuid will never change during an active connection, so after the first iteration this would just be caught by the err != nil check above and none of the outstanding batches would be cancelled. (It might also be nice for readability to move this into a standalone cancelAllBatches helper.)

See my previous comment #32785 (comment)

That makes sense, but in that case can we move the cleanup so it happens in Connect when we set s.serverID? It will still never change over the course of a connection, so this check could be skipped -- keeping one-time initialization separate makes the logic of ackLoop clearer.

+1 to trying to trying to keep the cleanup in Connect to simplify the rest of the logic.

It's not going to simplify the rest of the logic, in fact it's going to add one more lock for the pending queue in a different place that can possibly cause a deadlock with the client lock. Also, I would have to copy items in two places. It's more robust to keep it the way it is and I don't see any reason to move it. Unless there is a good argument why the current state of the code does not work as intended I'm going to keep it.

TLDR; I think having only one place where we lock and mutate this pending slice is cleaner and safer. And moving this code gains no obvious benefit.

libbeat/outputs/shipper/shipper.go

rdner · 2022-08-29T15:23:32Z

CI failures (metricbeat) are unrelated to changes in the PR.

leehinman

LGTM

libbeat/outputs/shipper/shipper.go

faec · 2022-08-29T18:34:31Z

libbeat/outputs/shipper/shipper.go

+			s.pendingMutex.Lock()
+			lastProcessed := 0
+			for _, p := range s.pending {
+				if p.serverID != indexReply.Uuid {


That makes sense, but in that case can we move the cleanup so it happens in Connect when we set s.serverID? It will still never change over the course of a connection, so this check could be skipped -- keeping one-time initialization separate makes the logic of ackLoop clearer.

libbeat/outputs/shipper/shipper.go

cmacknz

I spotted one small issue (I think), otherwise LGTM.

…ackaging * upstream/main: (109 commits) Add cap_net_raw requirements to heartbeat docs (elastic#32816) apply a quick hotfix for having main working properly (elastic#32934) action: checks for x-pack/libbeat and libbeat (elastic#32754) Update to Go 1.18 in go.mod. (elastic#32940) [heartbeat] disable browser code on windows via build tags (elastic#32939) action: checks for heartbeat and x-pack/heartbeat (elastic#32749) Make event acknowledgment asynchronous in shipper output (elastic#32785) [Automation] Update elastic stack version to 8.5.0-fedc3e60 for testing (elastic#32930) Preallocate memory to reduce GC load (elastic#32905) [Automation] Update elastic stack version to 8.5.0-440e0896 for testing (elastic#32919) Skip broken ceph tests. (elastic#32912) Use non-deprecated docker image for testing jolokia (elastic#32885) update ironbank image product name (elastic#32867) ci: pre-commit stage within Jenkins (elastic#32839) Fix a couple of bugs in the logic for how AWS metric periods are calculated (elastic#32724) [Filebeat] [httpjson] Add support for single string containing multiple relation-types in getRFC5988Link (elastic#32811) [Heartbeat] Update HB k8s template to use <Mi> metric (elastic#32801) action: checks for metricbeat and x-pack/metricbeat (elastic#32748) action: checks for filebeat and x-pack/filebeat (elastic#32746) allow for json/ndjson content type with charset (elastic#32767) ...

So we can keep publishing batches not blocking on a single batch to be acknowledged. Also updated the config documentation.

Make event acknowledgment asynchronous in shipper output

94fb61b

So we can keep publishing batches not blocking on a single batch to be acknowledged.

rdner self-assigned this Aug 23, 2022

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 23, 2022

rdner added the v8.5.0 label Aug 23, 2022

rdner commented Aug 23, 2022

View reviewed changes

rdner added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Aug 23, 2022

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 23, 2022

rdner marked this pull request as ready for review August 23, 2022 17:07

rdner requested a review from a team as a code owner August 23, 2022 17:07

rdner requested review from cmacknz and fearful-symmetry and removed request for a team August 23, 2022 17:07

rdner requested a review from faec August 23, 2022 17:08

rdner mentioned this pull request Aug 24, 2022

Beats shipper output should not wait for full acknowledgment before sending more batches elastic/elastic-agent-shipper#98

Closed

rdner added the backport-skip Skip notification from the automated backport with mergify label Aug 24, 2022

jlind23 requested review from leehinman and belimawr August 24, 2022 15:45

faec reviewed Aug 24, 2022

View reviewed changes

rdner mentioned this pull request Aug 25, 2022

Build 799 for main with status FAILURE #32814

Closed

rdner added 2 commits August 25, 2022 09:32

Improve how the ackLoop is started

81b5df0

Update documentation

8d0ff22

leehinman reviewed Aug 25, 2022

View reviewed changes

libbeat/outputs/shipper/shipper.go Show resolved Hide resolved

leehinman reviewed Aug 26, 2022

View reviewed changes

libbeat/outputs/shipper/shipper.go Outdated Show resolved Hide resolved

Fix bugs and optimize Publish

97422ad

rdner requested review from faec and leehinman August 29, 2022 15:23

leehinman approved these changes Aug 29, 2022

View reviewed changes

faec reviewed Aug 29, 2022

View reviewed changes

cmacknz reviewed Aug 29, 2022

View reviewed changes

libbeat/outputs/shipper/shipper.go Show resolved Hide resolved

cmacknz reviewed Aug 29, 2022

View reviewed changes

libbeat/outputs/shipper/shipper.go Show resolved Hide resolved

cmacknz reviewed Aug 29, 2022

View reviewed changes

rdner added 3 commits August 30, 2022 11:30

Set pending to nil in Close

8348b9a

Fix typo in ackCancel

caaad48

Merge branch 'main' into async-shipper-ack

9b0bb7f

rdner requested review from cmacknz and faec August 30, 2022 11:06

cmacknz approved these changes Aug 30, 2022

View reviewed changes

rdner merged commit ec2ed6a into elastic:main Aug 31, 2022

rdner deleted the async-shipper-ack branch August 31, 2022 06:52

chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023

Make event acknowledgment asynchronous in shipper output (#32785)

6f655af

So we can keep publishing batches not blocking on a single batch to be acknowledged. Also updated the config documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make event acknowledgment asynchronous in shipper output #32785

Make event acknowledgment asynchronous in shipper output #32785

rdner commented Aug 23, 2022 •

edited by cmacknz

Loading

mergify bot commented Aug 23, 2022

elasticmachine commented Aug 23, 2022

elasticmachine commented Aug 23, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

faec Aug 24, 2022

rdner Aug 25, 2022

faec Aug 29, 2022

cmacknz Aug 29, 2022

rdner Aug 30, 2022 •

edited

Loading

rdner commented Aug 29, 2022

leehinman left a comment

faec Aug 29, 2022

cmacknz left a comment

Make event acknowledgment asynchronous in shipper output #32785

Make event acknowledgment asynchronous in shipper output #32785

Conversation

rdner commented Aug 23, 2022 • edited by cmacknz Loading

What does this PR do?

Why is it important?

Checklist

Related issues

mergify bot commented Aug 23, 2022

elasticmachine commented Aug 23, 2022

elasticmachine commented Aug 23, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

faec Aug 24, 2022

Choose a reason for hiding this comment

rdner Aug 25, 2022

Choose a reason for hiding this comment

faec Aug 29, 2022

Choose a reason for hiding this comment

cmacknz Aug 29, 2022

Choose a reason for hiding this comment

rdner Aug 30, 2022 • edited Loading

Choose a reason for hiding this comment

rdner commented Aug 29, 2022

leehinman left a comment

Choose a reason for hiding this comment

faec Aug 29, 2022

Choose a reason for hiding this comment

cmacknz left a comment

Choose a reason for hiding this comment

rdner commented Aug 23, 2022 •

edited by cmacknz

Loading

elasticmachine commented Aug 23, 2022 •

edited by jenkins-beats-ci bot

Loading

rdner Aug 30, 2022 •

edited

Loading