-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations during readNativeFrames, leading to ~15-20% performance improvement #157
Conversation
440b321
to
2822a7c
Compare
2822a7c
to
ae94f35
Compare
ae94f35
to
26d22cd
Compare
65b5e25
to
28d3c0c
Compare
@suyashkumar could you please take a look on this fork? We're seeing 5-10x speed improvements due to reduced hot path allocations. |
hi @kaxap, thanks for reaching out, looks like we are both pulling at the same thread here! Note that there will be a larger apparent percent latency reduction for files with more frames or larger frames (don't have such examples in Similar to your implementation, this change removes an unnecessary slice allocation in the loop. In this implementation, two slices are allocated per frame in advance of parsing the frame. In your implementation, I see you read the data into a single dimension slice until the user wants to unpack it (where the rest of the work is done). This is something I considered myself, however I was able to achieve similar results without changing the API surface (in the struct) in this change so I prefer that until the next minor release with api breaking changes. The reason for this is because slicing a golang slice doesn't lead to copying (though there is still some minor overhead). One of the things I see in your implementation is replacing the calls to Would you be able to run the benchmarks on your fork so we can see how they look? And/or, would you be willing to open a PR here with the uint read changes to move away from I will probably go ahead and merge this change to separate it from the others (there are some others I'd like to make too). |
@suyashkumar Thanks for the feedback. I was thinking about creating PR but I was not sure about the struct change I made, glad we cleared that up. |
This change reduces the number of allocations in
readNativeFrames
. Instead of allocating a slice for each pixel's samples, a single flat buffer slice is allocated upfront of the sizepixelsPerFrame*samplesPerPixel
. Later, ranges in that slice are referred to in the larger 2D slice. This leads to there only being two calls tomake
, leading to significant performance gains.On my machine running
make bench-diff
:We see similar results in the GitHub action benchmark.
Presumably the percentage performance gains would be even higher for DICOMs with more Native PixelData (e.g. more frames, pixels per frame, samples per pixel, etc).
This helps address #161.