-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Add two benchmarks that show performance of flattening an… #20116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Add two benchmarks that show performance of flattening an… #20116
Conversation
86d51e9
to
990f925
Compare
… array. The first is a naive imperative approach using appends in a loop. The second uses flatMap. We would like both of these to have equivalent performance.
990f925
to
4d76ff9
Compare
@swift-ci smoke benchmark |
@swift-ci smoke test |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
@swift-ci smoke test os x platform |
@gottesmm I would to like lower the base workload of these two tests by a factor of 20:
The
I don't see how Would you be OK with these modifications? |
Do what you need. I think you misunderstood me. When I say "We would like both of these to have equivalent performance" what I mean is that we would like FlattenListFlatMap to be as close to FlattenListLoop in performance as possible. In a perfect world they would be the same. If the wording is confusing, feel free to change it. The reason why the reserve capacity is in there is as an attempt to measure the speed of light, i.e. how fast ideally could this be imperatively for comparison purposes. That being said, I do understand the larger point. My suggestion would be to add a separate version of the loop without the reserve capacity. Then we can see all 3. |
One more question… you called the Which compiler/stdlib optimization opportunities did you have in mind, adding these benchmarks? |
It is a naive imperative approach since it is the obvious way to do it. I think you are thinking too much about this. |
Sorry, let me be a bit clearer. It is a naive implementation to me since it is what I would write quickly. In terms of what compiler/stdlib opportunities are available I am not sure. But this (and the additional test that I hope will get added) I think may provide interesting guidance around swift's performance when executing functional programming. I think this is a class of benchmarks that we could use more of, so I saw this as an opportunity to add coverage. |
So, when I really abuse the knowledge of internal structure of the [(Int, Int, Int, Int)], this is (I guess) the true speed of light (16x faster than FlattenLoop with func flattenUnsafe(_ input: [(Int, Int, Int, Int)]) -> [Int] {
return UnsafeBufferPointer(start: input, count: input.count)
.withMemoryRebound(to: Int.self, Array.init)
} Would you mind if I also added this one, or is this a bridge too far? |
I think that is not necessarily defined behavior (or at least it makes me a bit nervous). @atrick your thoughts? One thing that I think we could use no matter what is a copy of the benchmark without reserve capacity. |
Sure. I’m on it. Just wanted to do all the changes together, so I’m asking about this extreme performance case, too. The idea is to serve as an inspiration for potential future optimization. |
I'm just going to be honest. I don't know what that code does, and it makes me sad that you can write it. I guess it relies on an implicit conversion from Then it initializes an Array from a buffer pointer using type-based dispatch on an inferred type, which always makes it impossible for me to decipher the code. Which Array initializer is being called? 🤷♂️ There is an Array API designed for this, https://forums.swift.org/t/se-0223-accessing-an-arrays-uninitialized-buffer/15194/41 |
@palimondo Was thinking about this a bit. I am unsure if the unsafe version is interesting now that I am thinking about it. No /real/ work is being done (unless I am missing something). That being said, I think adding a benchmark without the reserve capacity and an additional lazy flat map would be interesting. |
@atrick Heh, you give me too much credit. This is my first time using Unsafe API and I wasn’t able to write that until I tried googling “swift array unsafe pointer copy”, this gist came up. Then I had to play with it in REPL, until it “worked”. The idea comes from vague remembernce that the fastest Swift version of the Mandelbrot benchmark in language shootout used tuples as SIMD types. So I assumed the tuple in @gottesmm This isn’t O(1), as I thought initially, this is O(n) - it does depends on the size of |
All in all, I think if I rewrote it with |
@palimondo The gist has a bit less type inference, so it's easier for me to see what's happening. I didn't realize earlier that, if I'm guessing correctly, you're actually using Array's generic That said, I expect this code to crash, at least in debug builds, because you can't rebind a buffer's memory to a different element type unless they have the same stride. There's been some debate about whether to allow that. I think it creates more problems than it's worth. The right way to do it is:
Note: Swift does not provide any formal layout rule that says an array of homogeneous tuples is laid out the same as an array of those tuple elements. But with builtin integer types I feel that you're safe making this assumption. |
@atrick Thanks for showing me the right way. I'm still waiting for the debug build to finish and verify that my Unsafe benchmark was illegal code (I'll file a bug if that's the case), but I also had to fix it ( That's the danger of using just So, the correct speedup vis-a-vis FlatenLoop with |
Uhm... the debug build failed to compile after 6 hours, so I couldn't verify... Was I doing it right? |
The extended `Flatten` test family is based on recently added benchmarks `FlattenLostLoop` and `FlattenListFlatMap`. They had unnecessarily large base workloads, which prevented more precise measurement. Their base workload was lowered by a factor of 20. See discussion on swiftlang#20116 (comment) Since these are recent additions to the Swift Benchmark Suite, I’m removing the originals and reintroducing them under new names `Flatten.Array.Tuple4.flatMap` and `Flatten.Array.Tuple4.for-in.Reserve`without going through the `legacyFactor`. Based on these two templates, this commit introduces thorough performance test coverage of the related space including: * method chain `map.joined` * naive for-in implementation without `reserveCapacity` * few Unsafe variants that should serve as aspirational targets for ideally optimized code * lazy variants * variants for different underlying types: 4 element Array, struct and class in addition to the original 4 element tuple * variants that flatten Sequence instead of Array The tests follow naming convention proposed in swiftlang#20334
… array.
The first is a naive imperative approach using appends in a loop. The second
uses flatMap. We would like both of these to have equivalent performance.