-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore memory queue's internal event cleanup after a batch is vended #41356
Conversation
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy question - Any reason why the disk queue freeentries method does not do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes make sense to me.
@@ -127,6 +133,10 @@ func (b *mockQueueBatch) Entry(i int) queue.Entry { | |||
return fmt.Sprintf("event %v", i) | |||
} | |||
|
|||
func (b *mockQueueBatch) FreeEntries() { | |||
b.freeEntriesCalled++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can cause problems if the test code runs in multiple goroutines somehow, might not be a problem rn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, though right now this struct is only used in a single synchronous call to newBatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Since the disk queue's main buffer isn't in memory, it isn't hit by this bug. Originally, using FreeEntries this way was a memory / GC optimization and the benchmarking focus was on the memory queue. We could implement FreeEntries in the disk queue also and probably get a moderate memory reduction, but it would require some refactoring in how the disk queue tracks acknowledgments. (I'd be happy to spend some time cleaning up the disk queue's ack handling but it's never been top priority :-) ) |
…#41356) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a)
…#41356) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a)
…#41356) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a)
…#41356) (#41364) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a) Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
…#41356) (#41363) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a) Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
…#41356) (#41362) Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue. The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.) (cherry picked from commit fdb912a) Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
@faec can you create an issue describing the benchmark we would have needed to have caught this before release? |
Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.
The root cause, found by @swiatekm, is this PR, an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old
FreeEntries
hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed theFreeEntries
call inside the queue which was essential for releasing event memory.)Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues