Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transform performance (by caching affine transforms resulting from transform components) #8691

Merged
merged 31 commits into from
Jan 15, 2025

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Jan 14, 2025

Related

What

Introduces a new store subscriber, TransformCacheStoreSubscriber, that keeps track of when the tree/pose/pinhole transform of an entity changes and stores the resulting affine transforms (taking into account all transform components).

Effectively this splits out one of the responsibilities of the TransformContext into a separate construct, namely the calculation of transforms that need to be propagated in the tree. Since the actual tree propagation is still in what was previously the TransformContext, it got renamed to TransformTreeContext.
(there's more performance work to be done in that area, see comment notes for details)

For simplicity of implementation, TransformCacheStoreSubscriber doesn't calculate transforms when receiving store events, but rather upon request later on in TransformCacheStoreSubscriber::apply_all_updates. This avoids having to query the store while it is being populated.
In a similar vein, we absolutely do not want to reimplement latest-at semantics more than we need to which means that during apply_all_updates we calculate the transforms exactly as before via queries to the store, ignoring any prior knowledge about previous queries we might have.

Testing

There's lots of new tests added (which caught plenty of issues!), but I also inspected some of the snippets.
The lack of unit tests on the transform tree itself is a bit unnerving, we'll have to correct for that in the future

For performance comparisons I tried various large scenes, typically with --threads 1 to account for other things becoming the bottleneck (I'm look at you annotation context-context 🙄 ), but even without that overall there's a clear performance progression. How much depends on various factors, but the scene attached to #7604 gives an extreme case for the possible gains:

Numbers with time cursor at +297 682.003s, time panel minimized. Gathered on my Windows machine.

before (main @ b2b7f91)

after:

(as seen from the numbers we're unusually bad at parallelizing in this scene (which actually makes sense from the way it's set up [...]), profiler traces are a lot more readable with less threads though since all the large blocks are far down in worker therads otherwise)

We still loose an unreasonable amount of time in the TransformTreeContext in scenes with many entities which we'll need to address (even when there's few transforms in said scenes - above test scene doesn't show that, but others like the alien_cake_addict scene from revy do!). Furthermore, we have to expect that ingestion has a bit of a slow-down because of the added work due to substription and (more so) apply_all_updates later on - these impacts have not been tested so far.

@Wumpf Wumpf added 🚀 performance Optimization, memory use, etc include in changelog 📺 re_viewer affects re_viewer itself labels Jan 14, 2025
Copy link

github-actions bot commented Jan 14, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
3e21fdd https://rerun.io/viewer/pr/8691 +nightly +main

Note: This comment is updated whenever you push a commit.

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Is this whole chunk now effectively gone??

image

crates/viewer/re_view_spatial/src/transform_cache.rs Outdated Show resolved Hide resolved
crates/viewer/re_view_spatial/src/transform_cache.rs Outdated Show resolved Hide resolved
crates/viewer/re_view_spatial/src/transform_cache.rs Outdated Show resolved Hide resolved
crates/viewer/re_view_spatial/src/transform_cache.rs Outdated Show resolved Hide resolved
@emilk
Copy link
Member

emilk commented Jan 15, 2025

How is the ingestion performance affected?

@Wumpf
Copy link
Member Author

Wumpf commented Jan 15, 2025

How is the ingestion performance affected?

It can't be better that's for sure, but I haven't tested it yet as mentioned in the description. Expectation is that we take the big hit on each frame after lots of transform data comes in. I think to properly test this I'd have to set up a script that continously feeds lots of transforms. Not hard to do, but not something I got around doing so far, too many other pressing things... I can set an optimistic reminder for next week

@Wumpf
Copy link
Member Author

Wumpf commented Jan 15, 2025

Very nice! Is this whole chunk now effectively gone??

yes! :)
In extreme cases like the demonstrated (user provided!) scene. In some others we unfortunately still spend a whole milisecond for each 2d/3d view for the hierarchical propagation (often of entities that aren't even on screen -.-) even when there's no transforms, but even there it's better since we don't have to query anymore

@Wumpf Wumpf force-pushed the andreas/transform-mat4-subscriber branch from d7d3942 to c517a85 Compare January 15, 2025 10:53
@Wumpf Wumpf merged commit 77d4f58 into main Jan 15, 2025
31 of 32 checks passed
@Wumpf Wumpf deleted the andreas/transform-mat4-subscriber branch January 15, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 🚀 performance Optimization, memory use, etc 📺 re_viewer affects re_viewer itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rerun is extremely slow when updating a root pose
3 participants