-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Event Hubs] Fix memory leak in EHBufferedProducerClient #26748
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
… harshan/issue/25426
… harshan/issue/25426
In each loop iteration we only get the next one when there's an event; otherwise azure-sdk-for-js/sdk/eventhub/event-hubs/src/batchingPartitionChannel.ts Lines 186 to 188 in c1c67fa
|
/azp run js - event-hubs - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the awesome fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern with the new core-util
code, but otherwise good
); | ||
} finally { | ||
aborter.abort(); | ||
options?.abortSignal?.removeEventListener("abort", () => aborter.abort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work since the anonymous function () => aborter.abort()
will have a different identity than the one registered above. I believe you need to pull it out into a local like
function abortHandler() {
aborter.abort();
}
options?.abortSignal?.addEventListener("abort", abortHandler);
try {
// etc
} finally {
options?.abortSignal?.removeEventListener("abort", abortHandler);
}
in the future we can use addEventListener with the 'once' option to avoid this dance, but the above will work everywhere.
Maybe this event handler not being unregistered is the source of your small leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, a great catch, will check that, thanks.
}); | ||
|
||
it("should resolve with the first promise that resolves, abort the rest", async function () { | ||
const startTime = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little wary of doing math with clocktime in tests unless we're mocking the clock to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the delay function for an in-place function with my trackers in it :)
and got rid of the clocks and math, let me know if that looks good
Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
… harshan/issue/25426
…u/azure-sdk-for-js into harshan/issue/25426
… harshan/issue/25426
/azp run js - event-hubs - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
Packages have not been released yet. Followup of #26748
Issue
First reported at #25426, thanks to @pkraj0 for reporting the issue.
EventHubBufferedProducerClient
leaks memory when there are no packets to send to event-hubs.EventHubBufferedProducerClient
maxWaitTimeInMs
has elapsed with no new events enqueued.Stress Testing
The test sends about 5000 events and stops for a long duration, revealing the memory blowing up.
Investigation
Studying the heap snapshots, and comparing them at various timestamps during the test run hinted at the issue.
The problem is at the
BatchingPartitionChannel#_startPublishLoop
and theAwaitableQueue#shift
implementations.BatchingPartitionChannel#_startPublishLoop
abortSignal
is invokedAwaitableQueue#shift
Problem
BatchingPartitionChannel#_startPublishLoop
relies on a while loop to run forever.#_startPublishLoop
is a race between two promises(AwaitableQueue#shift()
anddelay()
)AwaitableQueue#shift()
never settles because there is no activity.Promise.race
resolves as soon as one of the promises is fulfilled.Promise.race
though resolves, keeps a reference for the pending promise fromAwaitableQueue#shift()
, references get accumulated because of the while loop causing the leak.Solution
The fix is to abort the pending promise from
AwaitableQueue#shift()
once the race has been won by the otherdelay()
promise.What's in the PR?
@azure/core-util
racePromisesAndAbortLosers
, an abstraction that leverages"Promise.race()"
and aborts the losers of the race as soon as the first promise fulfills.@azure/event-hubs
BatchingPartitionChannel#_startPublishLoop
to useracePromisesAndAbortLosers
instead ofPromise.race
.AwaitableQueue#shift()
to return a promise that is cancellable so that the pending promise cancels once the first promise fromPromise.race
resolves .Moved the stress testing to #27184