Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Arcanization #2272

Closed
wants to merge 2 commits into from
Closed

[WIP] Arcanization #2272

wants to merge 2 commits into from

Conversation

kvark
Copy link
Member

@kvark kvark commented Dec 9, 2021

This is what's left from @pythonesque work.

@John-Nagle
Copy link

John-Nagle commented Apr 13, 2022

Any progress on this? This is blocking Rend3 #350, , which is really slowing down my Second Life / Open Simulator renderer. That, being a modern multiverse system where content comes from the network, is constantly loading content in other threads while displaying the current scene. I need minimal slowdown in the refresh thread as this takes place. I'm getting frame rates in the 20FPS range during heavy content loading.

@kvark
Copy link
Member Author

kvark commented Apr 16, 2022

This is stale, in need of a champion.

@John-Nagle
Copy link

A rename might help. Like "Fix concurrency performance bug".

@John-Nagle
Copy link

Still a problem. As I said above, 20 FPS. Updating the scene from another thread kills rendering performance.
Is there going to be action on this in the near future? I know it's hard, but the whole point of all this is to do high-performance rendering in Rust. I'm one of the few people doing really complex scenes, and I need all the performance I can get.

Watch this video. This is the kind of thing I need to render fast while new content is coming in from the server.

https://video.hardlimit.com/w/peBesyAgtzfRWS5FnDQQtn

@jimblandy
Copy link
Member

@John-Nagle This is absolutely the kind of thing that wgpu should be aiming to do well - no disagreement there. It's why we exist.

One thing you might be in a good position to do that would help enormously to move this forward is to pull together some benchmarks that exhibit the slowdowns you're concerned about. For performance work, it is almost always futile to stare at the code and guess how to make your users' code run faster. Having a realistic load to benchmark and instrument makes it possible to direct one's work effectively. These benchmarks would probably need to be freely redistributable, so that wgpu contributors could download them and work on them.

Is that something you might be able to help with?

@jimblandy
Copy link
Member

For example - we know that the hub RwLocks are contended - that's the point of this bug. But which ones are worth fixing first? I'll bet cleaning up Adapter ownership won't make any difference at all. Are there others that just don't matter?

And what if it's not just the Hub RwLocks? I'll bet my hat there are other bottlenecks. Having realistic loads will help us find all of them.

@John-Nagle
Copy link

John-Nagle commented Apr 26, 2022

I have a big system that does a lot in parallel, but not a micro-benchmark. What I have obtains its content from servers, and there are IP issues around how that can be used. We need 1) something self-contained refreshing a scene with a lot of different objects, and 2) other threads busily adding and deleting objects, textures, and materials from the scene. I think Connor Fitzgerald might have a test. Something to mod. It's not a standard Rend3 example, though.

Does anyone have something I can work from? Something that generates test meshes, materials, and textures? Thanks.

@kvark
Copy link
Member Author

kvark commented Apr 26, 2022

I suppose a potential benchmark can just be that: how many draw calls can be recorded in fixed time. Supposing 2 pipelines, 2 bind groups, and each draw call is just alternating between them, in order to avoid the state change being optimized out. We could write this down as a function of N threads and see if the number of draw calls scales up accordingly. Literally, as a criterion benchmark. We don't care about execution, even - just recording. The hypothesis is that it will not scale, because of the render pass end locks.

@jimblandy
Copy link
Member

We could write this down as a function of N threads and see if the number of draw calls scales up accordingly.

It makes sense to check this specific scaling property. That's a good litmus test for "hub rwlocks are fixed".

But we also need to know how that one aspect of performance fits into everything else that goes on in a realistic load. It seems unlikely to me that the hub rwlocks are the only thing we need to know about, if we really want to support cases like the one shown in John-Nagle's video. This is why I want more realistic loads to work with - my guess is that full arcanization isn't necessary, and that arcanization alone isn't sufficient, to bring wgpu to the performance we want.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Apr 26, 2022

There are some tracy logs in this rend3 issue of @John-Nagle's program specifically. I don't believe resource contention is the direct cause of the slowdown in his case (it is caused by tracking performance with massive bindless bind groups) but once that problem is solved it will start to show its face again.

It's a bit hard to tell a single culprit for locking from the trace, but it's mainly thing fighting against create_texture or write_textures on other threads.

As for arcinzation itself, we definitely should write a benchmark command lists encoding against each other, as it should be able to operate fully in parallel. This is a pretty common one people hit and has been the direct cause of us losing users.

I was planning on writing a minimal wgpu benchmark for tracking performance (maybe reviving kvark's https://github.com/kvark/wgpu-bench) so doing another one for threading would be easy.

@John-Nagle
Copy link

I agree a test case is needed. So, I'm starting to write one. (https://github.com/John-Nagle/render-bench) Nothing there yet; it's just an empty Rust project right now.

What I intend to do is generate a city of random blocky objects, all different, and constantly replace objects with other objects from multiple threads while another thread runs the renderer. That should replicate the kind of load I'm putting on the system in my real program.

It's not draw that's the problem. It's updating meshes, materials, and textures on the fly while drawing. See this Rend3 bug report, which has Tracy output. The updating threads are making Rend3 calls which queue up work to be done by the rendering thread, and that's slowing down the rendering thread. A lot.

(Why is is this important? Because I'm rendering a virtual world in which you can move around, and which is far too big to be in memory all at once. In the video, you see what looks like a static world, but inside the program, content is being frantically loaded and unloaded at various levels of detail as the camera moves.)

I'm not clear on how much of this blocking is Rend3 and how much is WGPU, but since Connor Fitzgerald said this #2272 was a block on Rend3 #350, I'm in here talking about this.

@jimblandy
Copy link
Member

Dying to see both of these benchmarks (and dying for time to work on perf)

@Imberflur
Copy link
Contributor

I may be able to find some time to contribute to this, although a good portion of that will be reviewing the previous discussions and the current diff to get an understanding of all the details.

@John-Nagle
Copy link

Plugging away on my render-bench. Right now, one big brick cube appears. Soon, something more reasonable with content stats similar to the real scenes I've shown. This should make the complex scene case more testable.

@John-Nagle
Copy link

John-Nagle commented Apr 29, 2022

As requested, I have constructed a benchmark/test for this situation. See

https://github.com/John-Nagle/render-bench

This creates and removes a large number of non-shared meshes and materials from one thread, while another thread does screen redraws and nothing else.. Currently,
the textures do not change. This is a first cut at simulating metaverse-type content, where, due to a large number of independent creators, meshes are rarely shared and there is little instancing.

The frame rate here is at 60FPS on the static scene, but drops to 13 FPS during mesh loading.

slowupdate

This is built on Rend3, and uses code from the Rend3 examples.

@cwfitzgerald
Copy link
Member

@John-Nagle Thank you so much for this! The issues reproduce very clearly on my machine and gives me a really useful tracy trace. I will put this all together into a larget scale tracking issue detailing the various bottlenecks we have right now.

@John-Nagle
Copy link

John-Nagle commented Apr 29, 2022

Thanks. It's good to hear that. Now this is an easily reproducible problem.

Cleaned up the test case (warnings, Clippy, format), but the only functional change is that it now prints the number of meshes added and deleted.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Jun 2, 2022

Considering our change of plans, and the formalization into an issue in the form of #2710, I'm going to close this.

cwfitzgerald pushed a commit that referenced this pull request Oct 25, 2023
Co-authored-by: Jim Blandy <jimb@red-bean.com>
@Wumpf Wumpf mentioned this pull request Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants