-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve transform performance (by caching affine transforms resulting from transform components) #8691
Conversation
…move unreachable handling
…ating cache on the fly
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/viewer/re_view_spatial/src/contexts/transform_tree_context.rs
Outdated
Show resolved
Hide resolved
crates/viewer/re_view_spatial/src/contexts/transform_tree_context.rs
Outdated
Show resolved
Hide resolved
crates/viewer/re_view_spatial/src/contexts/transform_tree_context.rs
Outdated
Show resolved
Hide resolved
crates/viewer/re_view_spatial/src/contexts/transform_tree_context.rs
Outdated
Show resolved
Hide resolved
crates/viewer/re_view_spatial/src/contexts/transform_tree_context.rs
Outdated
Show resolved
Hide resolved
How is the ingestion performance affected? |
It can't be better that's for sure, but I haven't tested it yet as mentioned in the description. Expectation is that we take the big hit on each frame after lots of transform data comes in. I think to properly test this I'd have to set up a script that continously feeds lots of transforms. Not hard to do, but not something I got around doing so far, too many other pressing things... I can set an optimistic reminder for next week |
yes! :) |
d7d3942
to
c517a85
Compare
Related
What
Introduces a new store subscriber,
TransformCacheStoreSubscriber
, that keeps track of when the tree/pose/pinhole transform of an entity changes and stores the resulting affine transforms (taking into account all transform components).Effectively this splits out one of the responsibilities of the
TransformContext
into a separate construct, namely the calculation of transforms that need to be propagated in the tree. Since the actual tree propagation is still in what was previously theTransformContext
, it got renamed toTransformTreeContext
.(there's more performance work to be done in that area, see comment notes for details)
For simplicity of implementation,
TransformCacheStoreSubscriber
doesn't calculate transforms when receiving store events, but rather upon request later on inTransformCacheStoreSubscriber::apply_all_updates
. This avoids having to query the store while it is being populated.In a similar vein, we absolutely do not want to reimplement latest-at semantics more than we need to which means that during
apply_all_updates
we calculate the transforms exactly as before via queries to the store, ignoring any prior knowledge about previous queries we might have.Testing
There's lots of new tests added (which caught plenty of issues!), but I also inspected some of the snippets.
The lack of unit tests on the transform tree itself is a bit unnerving, we'll have to correct for that in the future
For performance comparisons I tried various large scenes, typically with
--threads 1
to account for other things becoming the bottleneck (I'm look at you annotation context-context 🙄 ), but even without that overall there's a clear performance progression. How much depends on various factors, but the scene attached to #7604 gives an extreme case for the possible gains:Numbers with time cursor at
+297 682.003s
, time panel minimized. Gathered on my Windows machine.before (main @ b2b7f91)
--threads 2
(--threads 1
deadlocks on load for some scenes #8695:--threads 1
deadlocks on load?!)--threads 2
profiler:after:
--threads 2
(--threads 1
deadlocks on load for some scenes #8695:--threads 1
deadlocks on load?!)--threads 2
profiler:(as seen from the numbers we're unusually bad at parallelizing in this scene (which actually makes sense from the way it's set up [...]), profiler traces are a lot more readable with less threads though since all the large blocks are far down in worker therads otherwise)
We still loose an unreasonable amount of time in the
TransformTreeContext
in scenes with many entities which we'll need to address (even when there's few transforms in said scenes - above test scene doesn't show that, but others like thealien_cake_addict
scene from revy do!). Furthermore, we have to expect that ingestion has a bit of a slow-down because of the added work due to substription and (more so)apply_all_updates
later on - these impacts have not been tested so far.