-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question on benchmarks for array_of_struct #414
Comments
Hi @aktau, Honestly, I don't know what you "do wrong", as I don't know your code. Can you reproduce the original benchmarks? Maybe we should turn the question around: what do you think I did wrong? Regarding cache locality of AoS, I think it is not that clear. Actually, the benchmarks are intended to show that cache friendlyness of AoS decreases with increasing entity size while accessing always the same small amount of data. With AoS, the entire entity data needs to be loaded into the cache, no matter how much of it is actually used. With large entities, fewer fit into the cache and more frequent access to slower cache levels is required. So I think the entity size dimension of the problem is quite clear. What is less clear to me is why there is no saturation effect with an increasing number of entities. Well maybe there is, but only for even larger numbers. I also did the same benchmarks for a Rust ECS (rs-ecs), and qualitatively the results are the same. Could you share your benchmarking code for a comparison? |
I'm just running your code, unedited:
I don't think you did anything wrong. I was just surprised by the result.
So if I understand it right, arche uses a SoA pattern and in the loops only a small part of the struct is accessed, which is why it can pull ahead. Is that correct? |
Can you verify that the original output already shows what the benchstat table does?
Yes, kind of. Most ECS implementations (archetype-based like Arche, but also sparse-set-based) use a memory layout more similar to SoA. However, the arrays (or buffers, or whatever) do not necessarily contain only primitive types, but components. A component can contain multiple variables/primitives, but for good performance they should be closely related and mostly accessed together. You may want to take a look at the architecture section of the user manual for more details. |
It does. I did not change the numbers. I only edited the output format so that benchstat cam render/compare them better (see the
I think that can explain the difference in performance. For very big structs, the SoA-like approach of arche would be superior, even taking the extra abstraction into account. |
Would still like to find out why your results differ that much. Could you share the raw results obtained on your machine? |
@aktau Looks like there are huge differences between machines. While on my local Windows machine, I can still reproduce the results shown in the README. In the CI on the other hand, we get something closer to your results. An excerpt is shown at the bottom. I guess the primary difference is the 256MB cache of the CI machine, compared to 8MB locally. Locally:
CI:
|
The graph in the README, with arche outperforming a bog-standard AoS setup really surprised me. The memory locality in the AoS case should be really good. To verify, I ran the benchmarks and found something closer in line with my expectations:
Of note, I manually changes the benchmark names with a text editor so I could use benchstat column projection.
What did I do wrong?
The text was updated successfully, but these errors were encountered: