Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations during readNativeFrames, leading to ~15-20% performance improvement #157

Merged
merged 1 commit into from
Dec 19, 2020

Conversation

suyashkumar
Copy link
Owner

@suyashkumar suyashkumar commented Dec 18, 2020

This change reduces the number of allocations in readNativeFrames. Instead of allocating a slice for each pixel's samples, a single flat buffer slice is allocated upfront of the size pixelsPerFrame*samplesPerPixel. Later, ranges in that slice are referred to in the larger 2D slice. This leads to there only being two calls to make, leading to significant performance gains.

On my machine running make bench-diff:

name           old time/op  new time/op  delta
Parse/1.dcm-8  2.29ms ± 1%  1.95ms ± 3%  -15.00%  (p=0.008 n=5+5)
Parse/2.dcm-8  2.27ms ± 1%  1.94ms ± 0%  -14.41%  (p=0.008 n=5+5)
Parse/3.dcm-8  2.10ms ± 0%  1.81ms ± 2%  -13.77%  (p=0.008 n=5+5)
Parse/4.dcm-8   240ms ± 1%   196ms ± 4%  -18.27%  (p=0.008 n=5+5)
Parse/5.dcm-8  33.6ms ± 1%  27.9ms ± 0%  -17.00%  (p=0.008 n=5+5)

We see similar results in the GitHub action benchmark.

Presumably the percentage performance gains would be even higher for DICOMs with more Native PixelData (e.g. more frames, pixels per frame, samples per pixel, etc).

This helps address #161.

@suyashkumar suyashkumar self-assigned this Dec 18, 2020
@suyashkumar suyashkumar changed the title [Experimental] Reduce allocations during readNativeFrames Reduce allocations during readNativeFrames, leading to ~15% latency improvement Dec 18, 2020
@suyashkumar suyashkumar changed the title Reduce allocations during readNativeFrames, leading to ~15% latency improvement Reduce allocations during readNativeFrames, leading to ~15% performance improvement Dec 18, 2020
@kaxap
Copy link
Contributor

kaxap commented Dec 19, 2020

@suyashkumar could you please take a look on this fork? We're seeing 5-10x speed improvements due to reduced hot path allocations.

@suyashkumar
Copy link
Owner Author

suyashkumar commented Dec 19, 2020

hi @kaxap, thanks for reaching out, looks like we are both pulling at the same thread here!

Note that there will be a larger apparent percent latency reduction for files with more frames or larger frames (don't have such examples in testdata/ yet, so it's not as apparent in the benchmarks above).

Similar to your implementation, this change removes an unnecessary slice allocation in the loop. In this implementation, two slices are allocated per frame in advance of parsing the frame. In your implementation, I see you read the data into a single dimension slice until the user wants to unpack it (where the rest of the work is done). This is something I considered myself, however I was able to achieve similar results without changing the API surface (in the struct) in this change so I prefer that until the next minor release with api breaking changes. The reason for this is because slicing a golang slice doesn't lead to copying (though there is still some minor overhead).

One of the things I see in your implementation is replacing the calls to binary.Read, which seems very interesting and useful--I didn't know it allocated a slice under the hood!

Would you be able to run the benchmarks on your fork so we can see how they look? And/or, would you be willing to open a PR here with the uint read changes to move away from binary.Read (but we can keep the slice structure the same for now)? If you open a PR, it will automatically run the benchmarks as a GitHub action and compare to baseline. Alternatively, I can make a change to that effect and add you as a co-author.

I will probably go ahead and merge this change to separate it from the others (there are some others I'd like to make too).

@suyashkumar suyashkumar changed the title Reduce allocations during readNativeFrames, leading to ~15% performance improvement Reduce allocations during readNativeFrames, leading to 15-20% performance improvement Dec 19, 2020
@suyashkumar suyashkumar changed the title Reduce allocations during readNativeFrames, leading to 15-20% performance improvement Reduce allocations during readNativeFrames, leading to ~15-20% performance improvement Dec 19, 2020
@suyashkumar suyashkumar linked an issue Dec 19, 2020 that may be closed by this pull request
@suyashkumar suyashkumar merged commit 33d864f into main Dec 19, 2020
@kaxap
Copy link
Contributor

kaxap commented Dec 20, 2020

@suyashkumar Thanks for the feedback. I was thinking about creating PR but I was not sure about the struct change I made, glad we cleared that up.
I'll create PR with binary<order>.Uintxx asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants