Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move math types over to nalgebra with Generic dimensions > 2 and f32/f64 support #96

Merged
merged 22 commits into from
May 4, 2023

Conversation

marstaik
Copy link
Collaborator

@marstaik marstaik commented Apr 22, 2023

BVH should now work (in-theory) with 2d,3d,4d+ and with f32,f64 using the base nalgebra types.

Consistent performance, everything does mostly get SIMD. On my older first gen threadripper, I see up to 10% performance improvements across the board. Those that match or are slightly above are usually within error.

Axis and some re-exports were simply deleted as they aren't needed when using the nalgebra types.

The only outliers are:

  • bench_intersects_aabb, which runs a bit slower than the fastest variant of before nalgebra
    • I wrote this to use pure SIMD, but it doesn't optimize as well as it should. I should be able to make this much faster with some manual SIMD later.
  • bench_optimize_bvh_* The optimization tests run over 200% faster, and blow the old optimize code out the water. It is actually faster to optimize now than to rebuild.
test glam nalgebra
test bvh::bvh_impl::bench::bench_build_1200_triangles_bvh 821,735 ns/iter (+/- 67,897) 771,730 ns/iter (+/- 21,811)
test bvh::bvh_impl::bench::bench_build_120k_triangles_bvh 103,104,360 ns/iter (+/- 4,907,146) 101,412,600 ns/iter (+/- 4,414,230)
test bvh::bvh_impl::bench::bench_build_12k_triangles_bvh 8,944,690 ns/iter (+/- 267,942) 8,685,235 ns/iter (+/- 131,969)
test bvh::bvh_impl::bench::bench_build_sponza_bvh 79,784,630 ns/iter (+/- 2,508,824) 78,247,340 ns/iter (+/- 1,407,163)
test bvh::bvh_impl::bench::bench_intersect_1200_triangles_bvh 130 ns/iter (+/- 2) 149 ns/iter (+/- 6)
test bvh::bvh_impl::bench::bench_intersect_120k_triangles_bvh 765 ns/iter (+/- 14) 863 ns/iter (+/- 12)
test bvh::bvh_impl::bench::bench_intersect_12k_triangles_bvh 321 ns/iter (+/- 6) 371 ns/iter (+/- 5)
test bvh::bvh_impl::bench::bench_intersect_sponza_bvh 2,289 ns/iter (+/- 64) 1,905 ns/iter (+/- 36)
test bvh::iter::bench::bench_intersect_128rays_sponza_iter 159,109 ns/iter (+/- 4,056) 167,965 ns/iter (+/- 3,013)
test bvh::iter::bench::bench_intersect_128rays_sponza_vec 293,305 ns/iter (+/- 7,316) 244,404 ns/iter (+/- 4,788)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_00p 764 ns/iter (+/- 12) 864 ns/iter (+/- 10)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_01p 128,246 ns/iter (+/- 10,879) 145,156 ns/iter (+/- 11,400)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_10p 1,634,400 ns/iter (+/- 387,476) 1,761,105 ns/iter (+/- 425,368)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_50p 2,234,812 ns/iter (+/- 440,313) 2,395,035 ns/iter (+/- 498,051)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_00p 764 ns/iter (+/- 19) 862 ns/iter (+/- 7)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_01p 824 ns/iter (+/- 29) 928 ns/iter (+/- 21)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_10p 1,843 ns/iter (+/- 62) 2,000 ns/iter (+/- 39)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_50p 2,128 ns/iter (+/- 46) 2,282 ns/iter (+/- 77)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_00p 1,613 ns/iter (+/- 51) 1,392 ns/iter (+/- 25)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_01p 2,495 ns/iter (+/- 113) 3,051 ns/iter (+/- 58)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_10p 3,781 ns/iter (+/- 153) 4,421 ns/iter (+/- 180)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_50p 5,534 ns/iter (+/- 394) 6,352 ns/iter (+/- 220)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_00p 1,609 ns/iter (+/- 36) 1,390 ns/iter (+/- 26)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_01p 1,729 ns/iter (+/- 36) 1,527 ns/iter (+/- 26)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_10p 2,074 ns/iter (+/- 144) 1,948 ns/iter (+/- 44)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_50p 2,665 ns/iter (+/- 71) 2,578 ns/iter (+/- 51)
test bvh::optimization::bench::bench_optimize_bvh_120k_00p 1,090,720 ns/iter (+/- 23,355) 1,164,270 ns/iter (+/- 79,435)
test bvh::optimization::bench::bench_optimize_bvh_120k_01p 7,373,650 ns/iter (+/- 1,720,855) 2,237,075 ns/iter (+/- 218,183)
test bvh::optimization::bench::bench_optimize_bvh_120k_10p 40,201,330 ns/iter (+/- 14,485,239) 12,243,750 ns/iter (+/- 1,361,338)
test bvh::optimization::bench::bench_optimize_bvh_120k_50p 222,987,100 ns/iter (+/- 68,341,041) 55,583,600 ns/iter (+/- 13,194,002)
test bvh::optimization::bench::bench_randomize_120k_50p 6,262,345 ns/iter (+/- 360,555) 5,717,540 ns/iter (+/- 229,427)
test flat_bvh::bench::bench_build_1200_triangles_flat_bvh 676,387 ns/iter (+/- 16,147) 658,178 ns/iter (+/- 16,926)
test flat_bvh::bench::bench_build_120k_triangles_flat_bvh 97,173,840 ns/iter (+/- 6,132,352) 93,301,800 ns/iter (+/- 3,916,466)
test flat_bvh::bench::bench_build_12k_triangles_flat_bvh 9,569,305 ns/iter (+/- 447,199) 9,162,365 ns/iter (+/- 242,592)
test flat_bvh::bench::bench_flatten_120k_triangles_bvh 7,332,465 ns/iter (+/- 984,515) 7,517,620 ns/iter (+/- 537,332)
test flat_bvh::bench::bench_intersect_1200_triangles_flat_bvh 147 ns/iter (+/- 2) 179 ns/iter (+/- 2)
test flat_bvh::bench::bench_intersect_120k_triangles_flat_bvh 890 ns/iter (+/- 31) 1,121 ns/iter (+/- 21)
test flat_bvh::bench::bench_intersect_12k_triangles_flat_bvh 364 ns/iter (+/- 3) 473 ns/iter (+/- 6)
test ray::bench::bench_intersects_aabb 2,897 ns/iter (+/- 46) 4,475 ns/iter (+/- 23)
test ray::bench::bench_intersects_aabb_branchless 3,469 ns/iter (+/- 40) -
test ray::bench::bench_intersects_aabb_naive 6,037 ns/iter (+/- 84) -
test testbase::bench_intersect_120k_triangles_list 7 ns/iter (+/- 0) 7 ns/iter (+/- 0)
test testbase::bench_intersect_120k_triangles_list_aabb 7 ns/iter (+/- 0) 7 ns/iter (+/- 0)
test testbase::bench_intersect_sponza_list 7 ns/iter (+/- 0) 7 ns/iter (+/- 0)
test testbase::bench_intersect_sponza_list_aabb 7 ns/iter (+/- 0) 7 ns/iter (+/- 0)

@marstaik
Copy link
Collaborator Author

Added the f32x3 optimization, It can be done quite easily to also handle f32x2,3,4, and f64x2,3,4, which I will do later.

Some new bench numbers:

test nalgebra
test bvh::bvh_impl::bench::bench_build_1200_triangles_bvh 777,386 ns/iter (+/- 22,640)
test bvh::bvh_impl::bench::bench_build_120k_triangles_bvh 99,323,460 ns/iter (+/- 1,955,579)
test bvh::bvh_impl::bench::bench_build_12k_triangles_bvh 8,601,280 ns/iter (+/- 160,429)
test bvh::bvh_impl::bench::bench_build_sponza_bvh 79,983,790 ns/iter (+/- 1,201,449)
test bvh::bvh_impl::bench::bench_intersect_1200_triangles_bvh 133 ns/iter (+/- 1)
test bvh::bvh_impl::bench::bench_intersect_120k_triangles_bvh 802 ns/iter (+/- 7)
test bvh::bvh_impl::bench::bench_intersect_12k_triangles_bvh 335 ns/iter (+/- 5)
test bvh::bvh_impl::bench::bench_intersect_sponza_bvh 1,403 ns/iter (+/- 24)
test bvh::iter::bench::bench_intersect_128rays_sponza_iter 156,681 ns/iter (+/- 3,499)
test bvh::iter::bench::bench_intersect_128rays_sponza_vec 179,903 ns/iter (+/- 2,719)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_00p 802 ns/iter (+/- 8)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_01p 134,032 ns/iter (+/- 10,309)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_10p 1,606,707 ns/iter (+/- 389,896)
test bvh::optimization::bench::bench_intersect_120k_after_optimize_50p 2,200,720 ns/iter (+/- 453,999)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_00p 801 ns/iter (+/- 8)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_01p 865 ns/iter (+/- 8)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_10p 1,887 ns/iter (+/- 26)
test bvh::optimization::bench::bench_intersect_120k_with_rebuild_50p 2,157 ns/iter (+/- 29)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_00p 1,297 ns/iter (+/- 21)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_01p 2,655 ns/iter (+/- 56)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_10p 3,847 ns/iter (+/- 114)
test bvh::optimization::bench::bench_intersect_sponza_after_optimize_50p 5,992 ns/iter (+/- 426)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_00p 1,300 ns/iter (+/- 20)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_01p 1,437 ns/iter (+/- 33)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_10p 1,814 ns/iter (+/- 29)
test bvh::optimization::bench::bench_intersect_sponza_with_rebuild_50p 2,370 ns/iter (+/- 48)
test bvh::optimization::bench::bench_optimize_bvh_120k_00p 1,158,720 ns/iter (+/- 7,658)
test bvh::optimization::bench::bench_optimize_bvh_120k_01p 2,232,760 ns/iter (+/- 89,304)
test bvh::optimization::bench::bench_optimize_bvh_120k_10p 12,362,530 ns/iter (+/- 1,015,947)
test bvh::optimization::bench::bench_optimize_bvh_120k_50p 58,228,630 ns/iter (+/- 11,172,616)
test bvh::optimization::bench::bench_randomize_120k_50p 5,814,270 ns/iter (+/- 232,303)
test flat_bvh::bench::bench_build_1200_triangles_flat_bvh 790,160 ns/iter (+/- 13,313)
test flat_bvh::bench::bench_build_120k_triangles_flat_bvh 104,565,130 ns/iter (+/- 4,193,432)
test flat_bvh::bench::bench_build_12k_triangles_flat_bvh 10,120,240 ns/iter (+/- 224,187)
test flat_bvh::bench::bench_flatten_120k_triangles_bvh 6,722,320 ns/iter (+/- 1,746,270)
test flat_bvh::bench::bench_intersect_1200_triangles_flat_bvh 131 ns/iter (+/- 1)
test flat_bvh::bench::bench_intersect_120k_triangles_flat_bvh 889 ns/iter (+/- 15)
test flat_bvh::bench::bench_intersect_12k_triangles_flat_bvh 354 ns/iter (+/- 4)
test ray::bench::bench_intersects_aabb 2,560 ns/iter (+/- 9)
test testbase::bench_intersect_120k_triangles_list 7 ns/iter (+/- 0)
test testbase::bench_intersect_120k_triangles_list_aabb 7 ns/iter (+/- 0)
test testbase::bench_intersect_sponza_list 7 ns/iter (+/- 0)
test testbase::bench_intersect_sponza_list_aabb 7 ns/iter (+/- 0)

Copy link
Owner

@svenstaro svenstaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, great stuff. There appears to be some dead code that should be handled.

src/ray.rs Outdated Show resolved Hide resolved
src/ray.rs Outdated Show resolved Hide resolved
use crate::bounding_hierarchy::{BHShape, BoundingHierarchy};
use crate::ray::Ray;

// TODO These all need to be realtyped and bounded
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO current?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I plan to expand the unit tests later to handle 2d and 4d cases

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to do that in this PR?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marstaik ping

src/ray.rs Outdated Show resolved Hide resolved
src/ray.rs Outdated
#[inline(always)]
#[cfg(target_arch = "x86_64")]
fn vec_to_mm(vec: &SVector<f32, 3>) -> __m128 {
unsafe { _mm_set_ps(vec.z, vec.z, vec.y, vec.x) }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really digging all this custom SIMD but I'm wondering whether perhaps packed_simd might be the way forward?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packed_simd says on the readme that std::simd is the way forward

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm true. I didn't know there's still no good stabilized SIMD stuff in Rust. What do you think instead about wide?

Copy link
Collaborator Author

@marstaik marstaik Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it and maybe trying Simba (what nalgebra uses under the hood) again. But with Simba I had performance magnitudes worse that with the pure simd, and I'm unsure why. I was having difficulty looking into it with cargo asm due to windows linking issues

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also simdeez which might be worth checking out.

src/lib.rs Outdated Show resolved Hide resolved
src/aabb.rs Outdated
/// use bvh::aabb::AABB;
///
/// # fn main() {
/// let aabb :AABB<f32,3> = AABB::infinite();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some dead code here.

@svenstaro
Copy link
Owner

@marstaik Are you likely to make more bigger contributions? I could give you contributor access if you're interested to help with development and maintenance but I couldn't find your email or Matrix for further communication. If you're interested, it might be good to talk first.

@marstaik
Copy link
Collaborator Author

@svenstaro I can use SIMD with pure core::arch::x86_64, as nothing else really works (simba, wide, and most other libraries don't give me all of the intrinsics needed for good performance, such as shuffle).

The issue I am having is that I still need the specialization feature from nightly to be safely able to handle all cases and dimensions. I am currently trying to find a way to do this without, but it seems very convoluted.

I have a generic ray_intersects_bvh for any Vector size, but I need to manually specialize for the lower dimension cases, as well as for when x86_64 is available.

@svenstaro
Copy link
Owner

Well realistically people will mostly be using this on x86_64. However, even if we make good performance nightly only for the time being, that should be put behind a feature flag so people can still choose to use this on stable.

@svenstaro
Copy link
Owner

I'm not sure it was clear earlier: If we have to use nightly for the speedy stuff anyway, then might as well use the most convenient solution to achieve that so you don't necessarily have to switch to core.

@marstaik
Copy link
Collaborator Author

I ended up sticking with core::arch as it was just way easier to use the native intrinsics than every libraries attempt at a custom wrapper for SIMD...

I added a feature flag called "full_simd" that only works with nightly enabled, allowing for specialization of the fast vectors.

I would like to add some benches and tests for 2d and 4d as well.

@svenstaro
Copy link
Owner

Please make sure to change the CI so it runs twice: Once with and once without the full_simd feature. Also, why not just call it simd?

Cargo.toml Outdated
@@ -30,7 +30,7 @@ doc-comment = "0.3"

[features]
bench = []
serde = ["dep:serde", "glam/serde"]
full_simd = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to document this in the README.

@marstaik
Copy link
Collaborator Author

marstaik commented Apr 27, 2023

Hey, I fixed the CI builds up a bit and added the SIMD builds via nightly only. The maintainer of the script for the rust environment disappeared and CI complained node was out of date, so I just overhauled it a bit.

I fixed all of the clippy errors, and all that was left were errors about BVH and AABB naming conventions so I also renamed BVH, AABB, BVHNode to Bvh, Aabb, BvhNode, and made the "Testing" versions start with a "T" to see how it would look. I kind of like it, but if you don't want to do a breaking change name wise I can revert those changes, and we can ignore on clippy again.

I fixed a couple of other issues clippy complained about as well.

@marstaik
Copy link
Collaborator Author

Also, the README probably needs an update to reflect new data about rebuilding times versus building the tree from scratch again, because the rebuild numbers have drastically changed. That and the generics/testing of f32/f64 N dimensions would be nice to get done in a separate commit.

Comment on lines 42 to 46
- name: cargo build
uses: actions-rs/cargo@v1
with:
command: build
args: --workspace
run: cargo build --workspace

- name: cargo test
uses: actions-rs/cargo@v1
with:
command: test

- name: cargo fmt
uses: actions-rs/cargo@v1
with:
command: fmt
args: --all -- --check

- name: cargo clippy
uses: actions-rs/cargo@v1
with:
command: clippy
args: --workspace --all-targets --all-features -- -D warnings
if: matrix.rust == 'nightly'

# - name: Run cargo-tarpaulin
# uses: actions-rs/tarpaulin@v0.1
# if: matrix.os == 'ubuntu-latest' && matrix.rust == 'stable'
run: cargo test
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this just

- run: cargo build
- run: cargo test

Since we're not actually using a workspace, we don't need the extra flag.

Comment on lines 63 to 67
- name: cargo build
run: cargo build --workspace --features simd

- name: cargo test
run: cargo test --features simd
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

- name: Run clippy
run: cargo clippy --workspace --all-targets --all-features -- -D warnings

build-and-test-no-features:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
build-and-test-no-features:
build-and-test-no-simd:

run: cargo clippy --workspace --all-targets --all-features -- -D warnings

build-and-test-no-features:
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no-features]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no-features]
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no SIMD]


- name: Upload coverage report to codecov.io
uses: codecov/codecov-action@v1
if: matrix.os == 'ubuntu-latest' && matrix.rust == 'stable'

build-and-test-simd:
name: CI with nightly on ${{ matrix.os }} [simd]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: CI with nightly on ${{ matrix.os }} [simd]
name: CI with nightly on ${{ matrix.os }} [SIMD]

README.md Outdated
@@ -12,42 +12,42 @@ volume hierarchies.**
## About

This crate can be used for applications which contain intersection computations of rays
with primitives. For this purpose a binary tree BVH (Bounding Volume Hierarchy) is of great
use if the scene which the ray traverses contains a huge number of primitives. With a BVH the
with primitives. For this purpose a binary tree Bvh (Bounding Volume Hierarchy) is of great
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the capitalization changes in docs and other prose where the docs do not reference a type. The code change is fine and in line with Rust best practices, though. I think this was probably renamed by mistake by find and replace.

Also check other files.

Ideally this rename would have been done in another PR since this one is gargantuan anyway but oh well.

README.md Show resolved Hide resolved
src/utils.rs Show resolved Hide resolved
src/aabb.rs Outdated
Comment on lines 6 to 17
use nalgebra::ClosedAdd;
use nalgebra::ClosedMul;
use nalgebra::ClosedSub;
use nalgebra::Point;
use nalgebra::SVector;
use nalgebra::Scalar;
use nalgebra::SimdPartialOrd;
use num::Float;
use num::FromPrimitive;
use num::One;
use num::Signed;
use num::Zero;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge these.

src/ray/intersect_default.rs Show resolved Hide resolved
@svenstaro
Copy link
Owner

I've also added a dummy CHANGELOG.md on master and I'd like to ask you to write a notice there and if you feel like it a short migration guide as well. We haven't had a changelog before and I feel this is as good an opportunity as any to start one at last.

@marstaik
Copy link
Collaborator Author

I think that should be everything :)

README.md Outdated
@@ -83,7 +95,7 @@ it is faster to update the tree, instead of rebuilding it from scratch.
First of all, optimizing is not helpful if more than half of the scene is not static.
This is due to how optimizing takes place:
Given a set of indices of all shapes which have changed, the optimize procedure tries to rotate fixed constellations
in search for a better surface area heuristic (SAH) value. This is done recursively from bottom to top while fixing the AABBs
in search for a better surface area heuristic (SAH) value. This is done recursively from bottom to top while fixing the Aabbs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few miss-capitalized occurrences like this in prose and docs. Could you do another pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, fixed! I didn't actually find any in the docs though.

@marstaik
Copy link
Collaborator Author

@svenstaro is this ready for merge?

@svenstaro
Copy link
Owner

Will take another look in a bit.

Cargo.toml Show resolved Hide resolved
@svenstaro
Copy link
Owner

I'm doing one final check now. One thing I was wondering about: Should we kick out optimization? It never performed super well it did degrade the tree quickly. Should we perhaps just focus on super quick tree building using multithreading and SIMD?

Copy link
Owner

@svenstaro svenstaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, very close now. I'm just missing some docs on the SIMD functions and some capitalization fixes.

//! By passing the indices of shapes that have changed, the function determines possible
//! tree rotations and optimizes the BVH using a SAH.
//! tree rotations and optimizes the Bvh using a SAH.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! tree rotations and optimizes the Bvh using a SAH.
//! tree rotations and optimizes the BVH using a SAH.

use rand::{thread_rng, Rng};
use std::collections::HashSet;

// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an AABB
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an Aabb
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an Aabb
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an AABB

// that is updated from the outside, perhaps by passing not only the indices of the changed
// shapes, but also their new AABBs into optimize().
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed.
// shapes, but also their new Aabbs into optimize().
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// shapes, but also their new Aabbs into optimize().
// shapes, but also their new AABBs into optimize().

// shapes, but also their new AABBs into optimize().
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed.
// shapes, but also their new Aabbs into optimize().
// TODO Consider: Stop updating Aabbs upwards the tree once an Aabb didn't get changed.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO Consider: Stop updating Aabbs upwards the tree once an Aabb didn't get changed.
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed.

/// Based on
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBVH/ssBVH.cs`]
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBvh/ssBvh.cs`]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBvh/ssBvh.cs`]
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBVH/ssBVH.cs`]

src/flat_bvh.rs Outdated
};

#[bench]
/// Benchmark the flattening of a BVH with 120,000 triangles.
/// Benchmark the flattening of a Bvh with 120,000 triangles.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Benchmark the flattening of a Bvh with 120,000 triangles.
/// Benchmark the flattening of a BVH with 120,000 triangles.

src/ray/intersect_x86_64.rs Show resolved Hide resolved
use crate::bounding_hierarchy::{BHShape, BoundingHierarchy};
use crate::ray::Ray;

// TODO These all need to be realtyped and bounded
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marstaik ping

src/testbase.rs Outdated
@@ -515,17 +522,17 @@ fn bench_intersect_sponza_list(b: &mut ::test::Bencher) {
intersect_list(&triangles, &bounds, b);
}

/// Benchmark intersecting the `triangles` list with `AABB` checks, but without acceleration
/// Benchmark intersecting the `triangles` list with `Aabb` checks, but without acceleration
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Benchmark intersecting the `triangles` list with `Aabb` checks, but without acceleration
/// Benchmark intersecting the `triangles` list with `AABB` checks, but without acceleration

src/testbase.rs Outdated
let mut seed = 0;
b.iter(|| {
let ray = create_ray(&mut seed, bounds);

// Iterate over the list of triangles.
for triangle in triangles {
// First test whether the ray intersects the AABB of the triangle.
// First test whether the ray intersects the Aabb of the triangle.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// First test whether the ray intersects the Aabb of the triangle.
// First test whether the ray intersects the AABB of the triangle.

@marstaik
Copy link
Collaborator Author

marstaik commented May 3, 2023

@svenstaro I'll make the requested changes and look into things later today - but before I do that I was wondering if the Aabb to AABB capitalization changes in the docs make sense - they are generally referring to them via their class names rather than the AABB as a concept.

It doesn't bother me either way but I wanted to note it.

@svenstaro
Copy link
Owner

svenstaro commented May 3, 2023

That's a good point. Let's make them clickable where they refer to the specific classes by doing

[`Aabb`]

@dbenson24
Copy link
Contributor

dbenson24 commented May 3, 2023

Once this PR lands I can take a look at adapting my work in #80. I think there's probably 3 PRs worth of stuff in that. The multithreaded builds, the replacement of optimize with the ability to add/remove individual nodes, and the additional shapes you can query with. The return of nalgebra to this crate makes the stuff I had been doing to have 64bit bvhs obsolete which makes me very happy.

@marstaik
Copy link
Collaborator Author

marstaik commented May 4, 2023

I'm doing one final check now. One thing I was wondering about: Should we kick out optimization? It never performed super well it did degrade the tree quickly. Should we perhaps just focus on super quick tree building using multithreading and SIMD?

I think we need to re-look at this. The new rebuild times are really really fast (400%) faster eg 200 sec to 40 sec for 50% modified. It might actually be worth it now. Maybe we can look into a better self optimizing bvh on removals and additions as well.

We should definitely look at both and compare, especially after multithreaded building is done. I wonder if optimizing can be improved via multithreading too thoough. Maybe @dbenson24 can look into it, as my next focus will probably be better unit tests and some more docs.

@dbenson24
Copy link
Contributor

If I recall correctly, the change to make optimize utilize the add/remove node functions was because that method produced trees with much faster traversal times compared to optimize

@marstaik
Copy link
Collaborator Author

marstaik commented May 4, 2023

I may have gone overboard on the document cleaning :)

@svenstaro
Copy link
Owner

Merging as-is. Amazing work!

@svenstaro svenstaro merged commit 7482caf into svenstaro:master May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants