Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ORANGE memory usage and "background" performance #1334

Open
sethrj opened this issue Jul 23, 2024 · 7 comments
Open

Improve ORANGE memory usage and "background" performance #1334

sethrj opened this issue Jul 23, 2024 · 7 comments
Assignees
Labels
orange Work on ORANGE geometry engine performance Changes for performance optimization
Milestone

Comments

@sethrj
Copy link
Member

sethrj commented Jul 23, 2024

On frontier, something catastrophic has happened to the performance of ORANGE in a couple of problems. Here's the relative throughput of 0.5.0-dev.202+141cd4928 compared to v0.4.4:
throughput

This is not due to a change in the number of steps (so the amount of work is the same):
work

As a reminder, the problem complexity and abbreviations:
problem-complexity
names
and 'M̃' means MSC is disabled.

@sethrj sethrj added bug Something isn't working orange Work on ORANGE geometry engine performance Changes for performance optimization labels Jul 23, 2024
@sethrj
Copy link
Member Author

sethrj commented Jul 24, 2024

@esseivaju 's runs on perlmutter confirm this is an ORANGE issue:
throughput

@sethrj
Copy link
Member Author

sethrj commented Jul 24, 2024

OK, the difference was from changing from the pregenerated .org.json to the built-in GDML generator... not clear why

@sethrj
Copy link
Member Author

sethrj commented Jul 24, 2024

@esseivaju has been doing some profiling of the along-step kernels. Long story short: the T- cases ("EM flat") are doing the same number of operations but the memory access are much more expensive.

It seems the main difference is the empty "calorimeter" volume, which is a box with all the inner boxes subtracted out: the manually constructed version doesn't have this. Even though the volume is never actually entered, it results in a logic expression for the volume that's huge, leading to a major change in "max faces" and "max intersections" which are used by the ORANGE state data:

 "orange": {
 "scalars": {
 "max_depth": 1,
-"max_faces": 12,
-"max_intersections": 12,
-"max_logic_depth": 3,
+"max_faces": 111,
+"max_intersections": 111,
+"max_logic_depth": 2,
 "tol": {
-"abs": 1e-08,
-"rel": 1e-08
+"abs": 1.5e-09,
+"rel": 1.5e-08
 }
 },
 "sizes": {
 "bih": {
 "bboxes": 102,
-"inner_nodes": 0,
-"leaf_nodes": 1,
+"inner_nodes": 99,
+"leaf_nodes": 100,
 "local_volume_ids": 102
 },
 "connectivity_records": 111,
 "daughters": 0,
-"local_surface_ids": 618,
-"local_volume_ids": 305,
-"logic_ints": 87,
+"local_surface_ids": 717,
+"local_volume_ids": 301,
+"logic_ints": 45,
 "real_ids": 111,
-"reals": 103,
+"reals": 104,
 "rect_arrays": 0,
 "simple_units": 1,
 "surface_types": 111,

I'm not sure whether our priority should be (1) patch this up somehow, (2) implementing different striding patterns and padding for GPU/CPU so that we can coalesce memory accesses in SimpleUnitTracker::simple_intersect/SimpleUnitTracker::intersect_impl; or (3) speed up our plans for short circuiting and evaluate the boundary distances on the fly rather than storing them to effectively "local" memory.

@sethrj
Copy link
Member Author

sethrj commented Jul 24, 2024

BTW I found this using git bisect: very glad (@pcanal ) that we use a squash merge strategy 😉

$ git bisect log
# bad: [21940769b1bfba623289ad1ebb3fe30b81d1b5e4] Update frontier toolchain and backward compatibility (#1330)
# good: [b1e52f2001c73fd887ab26dfa100480b5b146167] Add Cerenkov distribution and generator (#1080)
git bisect start 'HEAD' 'v0.5.0-dev'
# bad: [89336e59b66fad695833cba4c65fac976918e60e] Add Windows/Linux no-dependency builds (#1196)
git bisect bad 89336e59b66fad695833cba4c65fac976918e60e
# good: [18f4ae60832c06e70d24b915f0781df38474e038] Fix missing Werror in build-fast workflow (#1141)
git bisect good 18f4ae60832c06e70d24b915f0781df38474e038
# good: [4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27] Add sense evaluator for testing (#1168)
git bisect good 4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27
# good: [e1c7aedce274dece374ffeb8e093c1793d5b2e77] Pin sphinx at 7.2 to fix user doc build (#1188)
git bisect good e1c7aedce274dece374ffeb8e093c1793d5b2e77
# bad: [57ef806c1424e11f03299f0443fe60396e28e77d] Update esseivaj user presets (#1195)
git bisect bad 57ef806c1424e11f03299f0443fe60396e28e77d
# bad: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
git bisect bad 2be4ebb86ff1b764a04d7e931f25b78d56c963ae
# skip: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
git bisect skip 4e9676e42f064a40e9fdc4b0e949e128c2b99d1a
# skip: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180)
git bisect skip fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041
# only skipped commits left to test
# possible first bad commit: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
# possible first bad commit: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
# possible first bad commit: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180)

@sethrj
Copy link
Member Author

sethrj commented Jul 25, 2024

Yeah the ugly length-111 cell is the "background" volume (world).

@sethrj
Copy link
Member Author

sethrj commented Jul 29, 2024

It seems that the only real slowdown is because of the less compact data layout for the temporary states: I don't think the background cell is even being reached in practice. So we really need a more optimal way (BIH/BVH traversal) for tracking across these.

@sethrj sethrj added this to the v0.5.0 milestone Jul 29, 2024
@sethrj sethrj changed the title Fix performance regression in ORANGE on v0.5.0 branch Fix performance regression in ORANGE testem3 Jul 30, 2024
@sethrj
Copy link
Member Author

sethrj commented Sep 25, 2024

The plan discussed with @elliottbiondo:

  1. Refactor the background intersect from "all surfaces then volumes" to "all volumes then surfaces", using the BIH bounding boxes as a first pass for testing intersection. If it intersects the AABB, then we have to test all external surfaces for the volume, sort, bump past each surface (except for the last maybe?), test whether we end up inside the volume.
  2. Instead of using a bbox test, use an OBB test.
  3. Instead of looping over all volumes, track through a bounding tree.

@sethrj sethrj changed the title Fix performance regression in ORANGE testem3 Improve ORANGE memory usage and "background" performance Oct 7, 2024
@sethrj sethrj modified the milestones: v0.5.0, v1.0.0 Oct 7, 2024
@sethrj sethrj removed the bug Something isn't working label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange Work on ORANGE geometry engine performance Changes for performance optimization
Projects
None yet
Development

No branches or pull requests

3 participants