Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Support for Dynamic BVH as 2D Physics broadphase #48314

Merged
merged 1 commit into from
May 4, 2021

Conversation

pouleyKetchoupp
Copy link
Contributor

@pouleyKetchoupp pouleyKetchoupp commented Apr 30, 2021

Adding support to use @lawnjelly's dynamic BVH implementation in 2D physics.

I'm starting with a PR on 3.x as an exception, since the dynamic BVH has been implemented only on this branch for 3D. Once everything is validated, it can be all ported to master.

List of changes:

  • Modified bvh class to handle 2D and 3D as a template
  • Fixed unnecessary bvh tree updates when calling set_pairable with no parameter change
  • Changes in Rect2, Vector2, Vector3 interface to uniformize template calls
  • New option in Project Settings to enable BVH for 2D Physics (enabled by default as in 3D)

Note 1: I've tested #45824 with the bvh enabled, and the results are similar to the hash grid with #47979 applied, so the dynamic bvh seems to handle collision layers/masks properly and doesn't need extra optimization in 2D and 3D.

Note 2: I've tested with different values for the expansion margin in the BVH to see if 2D needs different parameters, but it didn't seem to make a difference in test results:

// these magic numbers are determined by experiment
if (_auto_node_expansion) {
_node_expansion = size * 0.025;
}
if (_auto_pairing_expansion) {
_pairing_expansion = size * 0.009;
}

Note 3: Performance tests have been done with a 3.x port of #48221, after which the 2D hash grid broadphase is the main bottleneck for the 2D physics step. Results would be different without multithreading as more frame time would be spent in the narrowphase and contact solver.


Test configuration:
CPU: AMD Ryzen 9 4900HS with Radeon Graphics, 3000 Mhz, 8 Core(s), 16 Logical Processor(s)
RAM: 16.0 GB
Display: GeForce RTX 2060 with Max-Q Design/PCIe/SSE2

Test scenes

2D - Simple contacts (1 island - 500 rigid bodies)
https://github.com/godotengine/godot-demo-projects/tree/master/2d/physics_tests/tests/performance/test_perf_contacts.tscn
~35% improvement (physics ticks)

Hashgrid - 500 bodies

* Spawning: RigidBodyRectangle
  Physics Tick: 364.486 ms (total = 364.486 ms)
  Physics Tick: 283.779 ms (total = 648.265 ms)
  Physics Tick: 78.198 ms (total = 726.463 ms)
  Physics Tick: 101.629 ms (total = 828.092 ms)
  Physics Tick: 77.658 ms (total = 905.750 ms)
* Activating
  Physics Tick: 8.358 ms (total = 8.358 ms)
  Physics Tick: 16.257 ms (total = 24.615 ms)
  Physics Tick: 16.890 ms (total = 41.505 ms)
  Physics Tick: 16.671 ms (total = 58.176 ms)
  Physics Tick: 16.369 ms (total = 74.545 ms)

BVH - 500 bodies

* Spawning: RigidBodyRectangle
  Physics Tick: 183.585 ms (total = 183.585 ms)
  Physics Tick: 228.199 ms (total = 411.784 ms)
  Physics Tick: 57.760 ms (total = 469.544 ms)
  Physics Tick: 63.035 ms (total = 532.579 ms)
  Physics Tick: 45.373 ms (total = 577.952 ms)
* Activating
  Physics Tick: 9.722 ms (total = 9.722 ms)
  Physics Tick: 16.194 ms (total = 25.916 ms)
  Physics Tick: 17.067 ms (total = 42.983 ms)
  Physics Tick: 16.848 ms (total = 59.831 ms)
  Physics Tick: 16.321 ms (total = 76.152 ms)

2D - Advanced contacts (multiple islands - 9 x 300 rigid bodies)
https://github.com/godotengine/godot-demo-projects/tree/master/2d/physics_tests/tests/performance/test_perf_contact_islands.tscn
~30% improvement (physics ticks)

Hashgrid - 9 x 300 bodies

* Spawning: RigidBodyRectangle
  Physics Tick: 1017.206 ms (total = 1017.206 ms)
  Physics Tick: 595.436 ms (total = 1612.642 ms)
  Physics Tick: 196.212 ms (total = 1808.854 ms)
  Physics Tick: 209.588 ms (total = 2018.442 ms)
  Physics Tick: 123.362 ms (total = 2141.804 ms)
* Activating
  Physics Tick: 8.835 ms (total = 8.835 ms)
  Physics Tick: 56.338 ms (total = 65.173 ms)
  Physics Tick: 2.514 ms (total = 67.687 ms)
  Physics Tick: 2.227 ms (total = 69.914 ms)
  Physics Tick: 4.986 ms (total = 74.900 ms)

BVH - 9 x 300 bodies

* Spawning: RigidBodyRectangle
  Physics Tick: 742.801 ms (total = 742.801 ms)
  Physics Tick: 460.125 ms (total = 1202.926 ms)
  Physics Tick: 134.630 ms (total = 1337.556 ms)
  Physics Tick: 134.842 ms (total = 1472.398 ms)
  Physics Tick: 77.242 ms (total = 1549.640 ms)
* Activating
  Physics Tick: 8.474 ms (total = 8.474 ms)
  Physics Tick: 23.160 ms (total = 31.634 ms)
  Physics Tick: 10.292 ms (total = 41.926 ms)
  Physics Tick: 16.423 ms (total = 58.349 ms)
  Physics Tick: 16.740 ms (total = 75.089 ms)

2D - Broadphase with no contact (125 000 rigid bodies)
https://github.com/godotengine/godot-demo-projects/tree/master/2d/physics_tests/tests/performance/test_perf_broadphase.tscn
~30% improvement on adding objects (physics ticks)

Hashgrid - 125 000 bodies

* Creating objects...
  Create Time: 1776.327 ms
  Physics Tick: 1806.515 ms (total = 1806.515 ms)
  Physics Tick: 9.409 ms (total = 1815.924 ms)
  Physics Tick: 2.070 ms (total = 1817.994 ms)
  Physics Tick: 2.232 ms (total = 1820.226 ms)
  Physics Tick: 1.852 ms (total = 1822.078 ms)
* Adding objects...
  Add Time: 1183.750 ms
  Physics Tick: 1443.885 ms (total = 1443.885 ms)
  Physics Tick: 421.520 ms (total = 1865.405 ms)
  Physics Tick: 218.179 ms (total = 2083.584 ms)
  Physics Tick: 226.198 ms (total = 2309.782 ms)
  Physics Tick: 217.243 ms (total = 2527.025 ms)
* Moving objects...
  Move Time: 61.472 ms
  Physics Tick: 88.662 ms (total = 88.662 ms)
  Physics Tick: 2.184 ms (total = 90.846 ms)
  Physics Tick: 1.635 ms (total = 92.481 ms)
  Physics Tick: 1.721 ms (total = 94.202 ms)
  Physics Tick: 1.258 ms (total = 95.460 ms)
* Removing objects...
  Remove Time: 3118.363 ms
  Physics Tick: 3136.320 ms (total = 3136.320 ms)
  Physics Tick: 2.342 ms (total = 3138.662 ms)
  Physics Tick: 1.708 ms (total = 3140.370 ms)
  Physics Tick: 2.159 ms (total = 3142.529 ms)
  Physics Tick: 1.463 ms (total = 3143.992 ms)
* Done.

BVH - 125 000 bodies

* Creating objects...
  Create Time: 1794.598 ms
  Physics Tick: 1797.726 ms (total = 1797.726 ms)
  Physics Tick: 8.751 ms (total = 1806.477 ms)
  Physics Tick: 11.474 ms (total = 1817.951 ms)
  Physics Tick: 1.343 ms (total = 1819.294 ms)
  Physics Tick: 1.812 ms (total = 1821.106 ms)
* Adding objects...
  Add Time: 899.626 ms
  Physics Tick: 1160.962 ms (total = 1160.962 ms)
  Physics Tick: 260.328 ms (total = 1421.290 ms)
  Physics Tick: 137.944 ms (total = 1559.234 ms)
  Physics Tick: 138.860 ms (total = 1698.094 ms)
  Physics Tick: 140.343 ms (total = 1838.437 ms)
* Moving objects...
  Move Time: 60.094 ms
  Physics Tick: 86.488 ms (total = 86.488 ms)
  Physics Tick: 1.900 ms (total = 88.388 ms)
  Physics Tick: 1.554 ms (total = 89.942 ms)
  Physics Tick: 1.284 ms (total = 91.226 ms)
  Physics Tick: 1.906 ms (total = 93.132 ms)
* Removing objects...
  Remove Time: 2848.977 ms
  Physics Tick: 2861.830 ms (total = 2861.830 ms)
  Physics Tick: 6.684 ms (total = 2868.514 ms)
  Physics Tick: 2.426 ms (total = 2870.940 ms)
  Physics Tick: 1.390 ms (total = 2872.330 ms)
  Physics Tick: 1.543 ms (total = 2873.873 ms)
* Done.

core/math/vector2.cpp Outdated Show resolved Hide resolved
@lawnjelly
Copy link
Member

On the whole getting it to work in 2d looks pretty straightforward and this looks great. 👍

A couple of points

  • Fixed unnecessary bvh tree updates when calling set_pairable with no parameter change - imo this should be done as a separate PR. This is probably fine but it is good practice to separate PRs as much as possible, and if there is an unforeseen problem we can revert one without the other. (I could easily have made some kind of misleading logic situation where that update was necessary despite appearing to be a no op)
  • The use of BVH_ABB<Bounds, Point> throughout is correct, but consider an alternative: You could use e.g.:
    #define BVH_ABB BVH_AABB_template<Bounds, Point>
    to be less verbose throughout, and you can easily change the template when required. This can often be a good idea with templates imo to reduce verbosity. But is not necessary just a possible.

@lawnjelly
Copy link
Member

lawnjelly commented Apr 30, 2021

Also note that the expansion parameters are actually set here (bvh_structs.inc, 169) rather than the section you quoted. That section is #ifdeffed out, the auto expansion margin is something that I trialled during the betas but is disabled currently, it should be using a fixed value I think:

// node expansion is important in the rendering tree
// larger values give less re-insertion as items move...
// but on the other hand over estimates the bounding box of nodes.
// we can either use auto mode, where the expansion is based on the root node size, or specify manually
real_t _node_expansion = 0.5;
bool _auto_node_expansion = true;

// pairing expansion important for physics pairing
// larger values gives more 'sticky' pairing, and is less likely to exhibit tunneling
// we can either use auto mode, where the expansion is based on the root node size, or specify manually
real_t _pairing_expansion = 0.1;
bool _auto_pairing_expansion = true;

If in 2d the unit is pixels then you may find that boosting these works a bit better. Incidentally, feel free to have a play with redefining the auto expansion and seeing if you can come up with a good algorithm, I think I just ran out of time on that so I fixed it for the RCs for safety. I originally had all the expansion margins / auto expansion figures editable in the project settings, but was figuring it was overkill for most users.

@pouleyKetchoupp
Copy link
Contributor Author

@lawnjelly Thanks for your feedback!

Fixed unnecessary bvh tree updates when calling set_pairable with no parameter change

It makes sense, I will make a separate PR for this since it's an independent fix.

The use of BVH_ABB<Bounds, Point> throughout is correct, but consider an alternative: You could use e.g.:
#define BVH_ABB BVH_AABB_template<Bounds, Point>

It's a good idea, I can change it to make the code more readable and easier to maintain.

Also note that the expansion parameters are actually set here (bvh_structs.inc, 169) rather than the section you quoted. That section is #ifdeffed out, the auto expansion margin is something that I trialled during the betas but is disabled currently

No wonder I wasn't seeing any difference :D
I will do a bit more testing with fixed margins, but maybe it's actually something we can expose again at some point since it can be influenced by the scale of colliding objects? Especially for 2D, users sometimes scale things down for pixelated games so common sizes can vary from 1 px to 100 px.

@lawnjelly
Copy link
Member

I will do a bit more testing with fixed margins, but maybe it's actually something we can expose again at some point since it can be influenced by the scale of colliding objects? Especially for 2D, users sometimes scale things down for pixelated games so common sizes can vary from 1 px to 100 px.

Yup absolutely. 👍 My thinking was that an AI system to choose the best values might be easiest for beginners, having them available would be good for advanced guys. I just wanted to keep things simple in the first rollout (and also iron out any bugs).

List of changes:
- Modified bvh class to handle 2D and 3D as a template
- Changes in Rect2, Vector2, Vector3 interface to uniformize template
calls
- New option in Project Settings to enable BVH for 2D Physics (enabled
by default like in 3D)
@pouleyKetchoupp
Copy link
Contributor Author

I've just pushed all the changes from previous comments and opened #48337 for the bvh tree optimization on set_pairing.

Concerning expansion margins:
I've made more tests with larger values for fixed margins (using params_set_pairing_expansion and params_set_node_expansion this time) and I still don't see much change in performance in my tests, so I'm just keeping the same default values for now.
It's something we can probably revisit later, possibly with different scenarios that come up in the future.

@lawnjelly
Copy link
Member

lawnjelly commented May 1, 2021

Assuming you've done the testing this looks good to me.

A couple of points to note:

  • Something long term that might be possible to improve is that the BVH uses elements that start from zero. I wrote the BVH before interfacing with the client code, the client code assumes that 0 indicates invalid.

So at the moment we have a bit of error prone +1 / -1 indexing hopscotch that may not be necessary. This can in some cases be resolved by adding a reserved dummy element at the beginning of arrays and taking account for this. But as there are quite a few arrays I never got around to changing this / evaluating whether it should be changed.

On the other hand using +1 indexing is quite widely used, so there's no great problem. I think quake uses this, which is where I originally picked up this technique many years ago (presumably they also did it because checking zero can be faster on processors than comparing to a number).

  • _FORCE_INLINE_ - this is my usual rant. Although reduz likes to use this all over the place, notice I never use it, even in bottleneck code with SIMD etc. If you put a small math function in the header, it is almost guaranteed to be inlined where suitable.

In the old days the inline keyword was intended to hint you wanted something inlined but afaik that is almost universally ignored these days.

The problem imo with _FORCE_INLINE_ is the only thing it achieves in practice (except in very specialist cases, such as maybe if you were compiling with reduced size compiler switch and you wanted to override this, or you wanted to inline a large function, were hand optimizing a section etc) is that it prevents the compiler from not inlining in the rare cases where it really should not inline. This could potentially bite you in terms of recursive functions and loop unrolling and executable size, and prevents the compiler from using certain optimizations. You could paradoxically end up with slower code by using this directive. The compiler is usually in a much better position to make these decisions than the programmer.

Also in extreme cases it can cause compilation to fail (we have had that in godot, see #41231). There are in 99% of cases no positive outcomes from using _FORCE_INLINE_, only negative.

Actually there is a possible exception, if you are using it for a function defined in the cpp it could possibly be useful with whole program optimization - I don't tend to rely on that myself, and haven't tested the effect. I'm primarily referring to the use of _FORCE_INLINE_ in a blanket manner with header only functions here.

@akien-mga akien-mga merged commit 32cc022 into godotengine:3.x May 4, 2021
@akien-mga
Copy link
Member

Thanks!

@akien-mga
Copy link
Member

akien-mga commented May 4, 2021

_FORCE_INLINE_ - this is my usual rant. Although reduz likes to use this all over the place, notice I never use it, even in bottleneck code with SIMD etc. If you put a small math function in the header, it is almost guaranteed to be inlined where suitable.

Might be worth doing some benchmarks after running s/_FORCE_INLINE_ //g on the whole codebase to see if it actually makes a difference with GCC, Clang and MSVC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants