-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exchange: bench, do not preallocate buffers #416
Conversation
268bdff
to
a6c142e
Compare
Benchmark results temporarily at https://dev.antiguru.de/criterion/report/ |
@antiguru , I took a close look at the PR. Unfortunately, trying it out is not as easy as just switching to a different timely branch; I'd have to modify our code that constructs the dataflow graph to apply the new pacts to every operator. I apologize, but unfortunately I won't have the cycles to work on this now :( |
4d9839b
to
9f52d13
Compare
9f52d13
to
5823de2
Compare
5823de2
to
3477d4f
Compare
e04a831
to
59b11f5
Compare
The LazyExchange PACT avoids allocations ahead of time while maintaining the performance of the eagerly-allocating Exchange PACT. Extend the Exchange operator with an `exchange_pact` function taking an explicit PACT. Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Instead of defining distinct exchange pushers, make the default generic and provide necessary hooks for specialized exchange pushers. Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
I did some measurements comparing the * current default, * a non-preallocating exchange * id. plus eagerly releasing exchange. The benchmark in the chain of commits does not report a significant change in performance for all variants. Erring on the side of caution, I'd propse to make the second variant, i.e. not pre-allocating, but retaining if push returned a buffer, the default exchange pact. Currently, all non-zerocopy allocators will not return an allocation on push, so in this case, the second and third variant are equivalent. # Why is there no change in performance? For single-recrod exchanges, this can be reasoned as follows. In the old approach, we'd have an allocated buffer, where we write a single element. On flush, the buffer would be sent to the pusher, which might or might not return a buffer. If it doesn't, we'd allocate a new one. In the new approach, we don't have a buffer to append to, so we'd first allocate one. On flush, we send the buffer downstream and maybe get one back. We don't eagerly allocate a new one if there is no buffer available. Both approaches allocate a new buffer if the pusher doesn't/didn't return one earlier. Only the order of allocations is changed. While in the old approach we send data and allocate, we allocate and send data in the new approach. On top of this, the new approach needs to pay a cost of checking the buffer's capacity for each record on the input, but I'd hope for the optimizer to sort this out. The same argument seems to hold for larger batch sizes. Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! If you don't mind, adding comments around the two new allocation blocks would make sure we don't forget why we are doing this, and prevents future me from just copying the push()
fragment without the allocation test.
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
59b11f5
to
090e5e0
Compare
Signed-off-by: Moritz Hoffmann <antiguru@gmail.com>
This PR:
I compared the current implementation versus the changes in this PR, and the benchmark doesn't report significant changes in performance. Obviously, this PR encodes a policy change from eagerly allocated buffers to allocating them on-demand, which could have an impact on memory utilization and processing latency.
As explained on the commit message, the cost of allocating buffers exists today, only that it is now during initial construction and with the default process allocator after push operations. We shift those to allocations before the fist buffer push, which means that the absolute number of allocations stays the same. Regarding latency, we need to check before pushing to a buffer whether it has the right capacity, which might add to latency. However, I'd hope that the compiler can fold this into the subsequent test whether the buffer is full, so I wouldn't expect much of a latency impact either.