-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high number of allocations in kgo.recordToRecord
function
#823
Comments
Alternatively, perhaps the |
In #827 I've proposed a possible solution to the problem. |
I see you took your proposal a good bit further in grafana#3. I used to have a PR 3 there I think is sound but is tricky to follow. I think it's trying to solve a few goals:
I think the implementation does the job, but the code is pretty sketchy to analyze. As well, I think it's not working 100% as intended in the case where you are consuming uncompressed data. Currently, if consuming uncompressed data, nothing is being acquired from the I think some of the buffers could be put back into the pool a bit more aggressively? e.g., I don't know why I'm open to a caching API, but I can't think of a great one yet. I don't think it's something that should be added always for everybody, especially not via globals. |
ref: twmb#823 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: twmb#823 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: twmb#823 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: twmb#823 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
) ref: twmb#823 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
This slab allocates the records we will be creating, reducing the number of allocs (and thus gc pressure), while keeping alloc size the sameish (or reducing it a bit). This should address the main problem for 823, but we can still add caching in a bit. For #823.
I've read through the PRs in the forked repo, I have a proposal that is a bit different. There are three areas that the fork introduces caching to:
As well, already internally there exists one per-client cache of The fork supports caching by adding a few private fields to the The first problem this issue raises -- many small allocs -- can be fixed by switching to two slab allocations (allocate a slice once and then each kmsg.Record or kgo.Record is an index in that slice). That's easy. For caching, I have a bit of a different idea. If you have time, please take a look at my most recent commit in this PR: #904 I know I put this off for a very long time, so I imagine you may not take a look at 904 too promptly. I'm going to test it a bit and work through other issues on my 1.19 release checklist; if y'all have no opinion by the time I make it through, I'll play with the PR a bit and eventually merge it anyway. So -- if you have time -- please take a look and let me know if the APIs introduced are usable. It leaves the implementation of caching entirely to the end user and allows the end user to opt into what caching they'd like. This also addresses #803 at the same time and exports the compressor & decompressor. |
As a result of the large volume of records generated, we’ve observed in one of our consumers a high number of allocations originated in the
kgo.recordToRecord
function (~65%
of total allocations according to the attached screenshot, concretely in this line). This results in a performance degradation caused by GC overhead.Has the possibility of reusing the generated records through a pool ever been considered in order to minimize this effect? Perhaps it could be offered as an optional parameter? I could propose a PR for your review if you consider it.
The text was updated successfully, but these errors were encountered: