-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ORANGE memory usage and "background" performance #1334
Comments
@esseivaju 's runs on perlmutter confirm this is an ORANGE issue: |
OK, the difference was from changing from the pregenerated |
@esseivaju has been doing some profiling of the along-step kernels. Long story short: the T- cases ("EM flat") are doing the same number of operations but the memory access are much more expensive. It seems the main difference is the empty "calorimeter" volume, which is a box with all the inner boxes subtracted out: the manually constructed version doesn't have this. Even though the volume is never actually entered, it results in a logic expression for the volume that's huge, leading to a major change in "max faces" and "max intersections" which are used by the ORANGE state data: "orange": {
"scalars": {
"max_depth": 1,
-"max_faces": 12,
-"max_intersections": 12,
-"max_logic_depth": 3,
+"max_faces": 111,
+"max_intersections": 111,
+"max_logic_depth": 2,
"tol": {
-"abs": 1e-08,
-"rel": 1e-08
+"abs": 1.5e-09,
+"rel": 1.5e-08
}
},
"sizes": {
"bih": {
"bboxes": 102,
-"inner_nodes": 0,
-"leaf_nodes": 1,
+"inner_nodes": 99,
+"leaf_nodes": 100,
"local_volume_ids": 102
},
"connectivity_records": 111,
"daughters": 0,
-"local_surface_ids": 618,
-"local_volume_ids": 305,
-"logic_ints": 87,
+"local_surface_ids": 717,
+"local_volume_ids": 301,
+"logic_ints": 45,
"real_ids": 111,
-"reals": 103,
+"reals": 104,
"rect_arrays": 0,
"simple_units": 1,
"surface_types": 111, I'm not sure whether our priority should be (1) patch this up somehow, (2) implementing different striding patterns and padding for GPU/CPU so that we can coalesce memory accesses in |
BTW I found this using $ git bisect log
# bad: [21940769b1bfba623289ad1ebb3fe30b81d1b5e4] Update frontier toolchain and backward compatibility (#1330)
# good: [b1e52f2001c73fd887ab26dfa100480b5b146167] Add Cerenkov distribution and generator (#1080)
git bisect start 'HEAD' 'v0.5.0-dev'
# bad: [89336e59b66fad695833cba4c65fac976918e60e] Add Windows/Linux no-dependency builds (#1196)
git bisect bad 89336e59b66fad695833cba4c65fac976918e60e
# good: [18f4ae60832c06e70d24b915f0781df38474e038] Fix missing Werror in build-fast workflow (#1141)
git bisect good 18f4ae60832c06e70d24b915f0781df38474e038
# good: [4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27] Add sense evaluator for testing (#1168)
git bisect good 4dd382ff2bcd4f1a5c4ca9a0605c8e28474b0f27
# good: [e1c7aedce274dece374ffeb8e093c1793d5b2e77] Pin sphinx at 7.2 to fix user doc build (#1188)
git bisect good e1c7aedce274dece374ffeb8e093c1793d5b2e77
# bad: [57ef806c1424e11f03299f0443fe60396e28e77d] Update esseivaj user presets (#1195)
git bisect bad 57ef806c1424e11f03299f0443fe60396e28e77d
# bad: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
git bisect bad 2be4ebb86ff1b764a04d7e931f25b78d56c963ae
# skip: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
git bisect skip 4e9676e42f064a40e9fdc4b0e949e128c2b99d1a
# skip: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180)
git bisect skip fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041
# only skipped commits left to test
# possible first bad commit: [2be4ebb86ff1b764a04d7e931f25b78d56c963ae] Switch ORANGE unit tests to use GDML files (#1181)
# possible first bad commit: [4e9676e42f064a40e9fdc4b0e949e128c2b99d1a] Define geometry traits (#1190)
# possible first bad commit: [fe2611587b4b4a1e4f6a71e33d3ebcb6e4332041] Complete GDML-to-ORANGE geometry converter (#1180) |
Yeah the ugly length-111 cell is the "background" volume (world). |
It seems that the only real slowdown is because of the less compact data layout for the temporary states: I don't think the background cell is even being reached in practice. So we really need a more optimal way (BIH/BVH traversal) for tracking across these. |
The plan discussed with @elliottbiondo:
|
On frontier, something catastrophic has happened to the performance of ORANGE in a couple of problems. Here's the relative throughput of 0.5.0-dev.202+141cd4928 compared to v0.4.4:
This is not due to a change in the number of steps (so the amount of work is the same):
As a reminder, the problem complexity and abbreviations:
and 'M̃' means MSC is disabled.
The text was updated successfully, but these errors were encountered: