[WIP] Arcanization #2272

kvark · 2021-12-09T19:02:35Z

This is what's left from @pythonesque work.

John-Nagle · 2022-04-13T21:58:37Z

Any progress on this? This is blocking Rend3 #350, , which is really slowing down my Second Life / Open Simulator renderer. That, being a modern multiverse system where content comes from the network, is constantly loading content in other threads while displaying the current scene. I need minimal slowdown in the refresh thread as this takes place. I'm getting frame rates in the 20FPS range during heavy content loading.

kvark · 2022-04-16T05:35:46Z

This is stale, in need of a champion.

John-Nagle · 2022-04-16T05:38:29Z

A rename might help. Like "Fix concurrency performance bug".

John-Nagle · 2022-04-26T04:05:14Z

Still a problem. As I said above, 20 FPS. Updating the scene from another thread kills rendering performance.
Is there going to be action on this in the near future? I know it's hard, but the whole point of all this is to do high-performance rendering in Rust. I'm one of the few people doing really complex scenes, and I need all the performance I can get.

Watch this video. This is the kind of thing I need to render fast while new content is coming in from the server.

https://video.hardlimit.com/w/peBesyAgtzfRWS5FnDQQtn

jimblandy · 2022-04-26T06:31:51Z

@John-Nagle This is absolutely the kind of thing that wgpu should be aiming to do well - no disagreement there. It's why we exist.

One thing you might be in a good position to do that would help enormously to move this forward is to pull together some benchmarks that exhibit the slowdowns you're concerned about. For performance work, it is almost always futile to stare at the code and guess how to make your users' code run faster. Having a realistic load to benchmark and instrument makes it possible to direct one's work effectively. These benchmarks would probably need to be freely redistributable, so that wgpu contributors could download them and work on them.

Is that something you might be able to help with?

jimblandy · 2022-04-26T06:35:16Z

For example - we know that the hub RwLocks are contended - that's the point of this bug. But which ones are worth fixing first? I'll bet cleaning up Adapter ownership won't make any difference at all. Are there others that just don't matter?

And what if it's not just the Hub RwLocks? I'll bet my hat there are other bottlenecks. Having realistic loads will help us find all of them.

John-Nagle · 2022-04-26T06:41:56Z

I have a big system that does a lot in parallel, but not a micro-benchmark. What I have obtains its content from servers, and there are IP issues around how that can be used. We need 1) something self-contained refreshing a scene with a lot of different objects, and 2) other threads busily adding and deleting objects, textures, and materials from the scene. I think Connor Fitzgerald might have a test. Something to mod. It's not a standard Rend3 example, though.

Does anyone have something I can work from? Something that generates test meshes, materials, and textures? Thanks.

kvark · 2022-04-26T15:16:57Z

I suppose a potential benchmark can just be that: how many draw calls can be recorded in fixed time. Supposing 2 pipelines, 2 bind groups, and each draw call is just alternating between them, in order to avoid the state change being optimized out. We could write this down as a function of N threads and see if the number of draw calls scales up accordingly. Literally, as a criterion benchmark. We don't care about execution, even - just recording. The hypothesis is that it will not scale, because of the render pass end locks.

jimblandy · 2022-04-26T17:05:42Z

We could write this down as a function of N threads and see if the number of draw calls scales up accordingly.

It makes sense to check this specific scaling property. That's a good litmus test for "hub rwlocks are fixed".

But we also need to know how that one aspect of performance fits into everything else that goes on in a realistic load. It seems unlikely to me that the hub rwlocks are the only thing we need to know about, if we really want to support cases like the one shown in John-Nagle's video. This is why I want more realistic loads to work with - my guess is that full arcanization isn't necessary, and that arcanization alone isn't sufficient, to bring wgpu to the performance we want.

cwfitzgerald · 2022-04-26T17:25:16Z

There are some tracy logs in this rend3 issue of @John-Nagle's program specifically. I don't believe resource contention is the direct cause of the slowdown in his case (it is caused by tracking performance with massive bindless bind groups) but once that problem is solved it will start to show its face again.

It's a bit hard to tell a single culprit for locking from the trace, but it's mainly thing fighting against create_texture or write_textures on other threads.

As for arcinzation itself, we definitely should write a benchmark command lists encoding against each other, as it should be able to operate fully in parallel. This is a pretty common one people hit and has been the direct cause of us losing users.

I was planning on writing a minimal wgpu benchmark for tracking performance (maybe reviving kvark's https://github.com/kvark/wgpu-bench) so doing another one for threading would be easy.

John-Nagle · 2022-04-26T17:25:56Z

I agree a test case is needed. So, I'm starting to write one. (https://github.com/John-Nagle/render-bench) Nothing there yet; it's just an empty Rust project right now.

What I intend to do is generate a city of random blocky objects, all different, and constantly replace objects with other objects from multiple threads while another thread runs the renderer. That should replicate the kind of load I'm putting on the system in my real program.

It's not draw that's the problem. It's updating meshes, materials, and textures on the fly while drawing. See this Rend3 bug report, which has Tracy output. The updating threads are making Rend3 calls which queue up work to be done by the rendering thread, and that's slowing down the rendering thread. A lot.

(Why is is this important? Because I'm rendering a virtual world in which you can move around, and which is far too big to be in memory all at once. In the video, you see what looks like a static world, but inside the program, content is being frantically loaded and unloaded at various levels of detail as the camera moves.)

I'm not clear on how much of this blocking is Rend3 and how much is WGPU, but since Connor Fitzgerald said this #2272 was a block on Rend3 #350, I'm in here talking about this.

jimblandy · 2022-04-26T17:46:30Z

Dying to see both of these benchmarks (and dying for time to work on perf)

Imberflur · 2022-04-26T18:17:08Z

I may be able to find some time to contribute to this, although a good portion of that will be reviewing the previous discussions and the current diff to get an understanding of all the details.

John-Nagle · 2022-04-27T21:22:57Z

Plugging away on my render-bench. Right now, one big brick cube appears. Soon, something more reasonable with content stats similar to the real scenes I've shown. This should make the complex scene case more testable.

John-Nagle · 2022-04-29T06:36:08Z

As requested, I have constructed a benchmark/test for this situation. See

https://github.com/John-Nagle/render-bench

This creates and removes a large number of non-shared meshes and materials from one thread, while another thread does screen redraws and nothing else.. Currently,
the textures do not change. This is a first cut at simulating metaverse-type content, where, due to a large number of independent creators, meshes are rarely shared and there is little instancing.

The frame rate here is at 60FPS on the static scene, but drops to 13 FPS during mesh loading.

This is built on Rend3, and uses code from the Rend3 examples.

cwfitzgerald · 2022-04-29T16:40:36Z

@John-Nagle Thank you so much for this! The issues reproduce very clearly on my machine and gives me a really useful tracy trace. I will put this all together into a larget scale tracking issue detailing the various bottlenecks we have right now.

John-Nagle · 2022-04-29T17:26:22Z

Thanks. It's good to hear that. Now this is an easily reproducible problem.

Cleaned up the test case (warnings, Clippy, format), but the only functional change is that it now prints the number of meshes added and deleted.

cwfitzgerald · 2022-06-02T18:22:32Z

Considering our change of plans, and the formalization into an issue in the form of #2710, I'm going to close this.

Co-authored-by: Jim Blandy <jimb@red-bean.com>

pythonesque added 2 commits August 16, 2021 17:02

WIP

f731b13

WIP 2

193aa0a

kvark force-pushed the master branch from a8cf45d to 873e83c Compare December 11, 2021 22:27

This was referenced Jan 16, 2022

RenderPass Drop seem to lock on each other in multithread #2395

Closed

write_buffer on another thread can block get_current_texture for an extra frame #2394

Closed

cwfitzgerald mentioned this pull request Feb 5, 2022

Declining performance with unreleased 0.3.x vs 0.2.2 BVE-Reborn/rend3#350

Closed

cwfitzgerald mentioned this pull request Jun 2, 2022

Remove Locking From Hot Paths #2710

Closed

cwfitzgerald closed this Jun 2, 2022

cwfitzgerald pushed a commit that referenced this pull request Oct 25, 2023

Skip gl_PerVertex unused builtins in the SPIR-V frontend (#2272)

63e91fa

Co-authored-by: Jim Blandy <jimb@red-bean.com>

Wumpf mentioned this pull request Nov 18, 2023

Deadlock on AMD/Mesa/vk #4686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Arcanization #2272

[WIP] Arcanization #2272

kvark commented Dec 9, 2021

John-Nagle commented Apr 13, 2022 •

edited

Loading

kvark commented Apr 16, 2022

John-Nagle commented Apr 16, 2022

John-Nagle commented Apr 26, 2022

jimblandy commented Apr 26, 2022

jimblandy commented Apr 26, 2022

John-Nagle commented Apr 26, 2022 •

edited

Loading

kvark commented Apr 26, 2022

jimblandy commented Apr 26, 2022

cwfitzgerald commented Apr 26, 2022 •

edited

Loading

John-Nagle commented Apr 26, 2022

jimblandy commented Apr 26, 2022

Imberflur commented Apr 26, 2022

John-Nagle commented Apr 27, 2022

John-Nagle commented Apr 29, 2022 •

edited

Loading

cwfitzgerald commented Apr 29, 2022

John-Nagle commented Apr 29, 2022 •

edited

Loading

cwfitzgerald commented Jun 2, 2022 •

edited

Loading

[WIP] Arcanization #2272

[WIP] Arcanization #2272

Conversation

kvark commented Dec 9, 2021

John-Nagle commented Apr 13, 2022 • edited Loading

kvark commented Apr 16, 2022

John-Nagle commented Apr 16, 2022

John-Nagle commented Apr 26, 2022

jimblandy commented Apr 26, 2022

jimblandy commented Apr 26, 2022

John-Nagle commented Apr 26, 2022 • edited Loading

kvark commented Apr 26, 2022

jimblandy commented Apr 26, 2022

cwfitzgerald commented Apr 26, 2022 • edited Loading

John-Nagle commented Apr 26, 2022

jimblandy commented Apr 26, 2022

Imberflur commented Apr 26, 2022

John-Nagle commented Apr 27, 2022

John-Nagle commented Apr 29, 2022 • edited Loading

cwfitzgerald commented Apr 29, 2022

John-Nagle commented Apr 29, 2022 • edited Loading

cwfitzgerald commented Jun 2, 2022 • edited Loading

John-Nagle commented Apr 13, 2022 •

edited

Loading

John-Nagle commented Apr 26, 2022 •

edited

Loading

cwfitzgerald commented Apr 26, 2022 •

edited

Loading

John-Nagle commented Apr 29, 2022 •

edited

Loading

John-Nagle commented Apr 29, 2022 •

edited

Loading

cwfitzgerald commented Jun 2, 2022 •

edited

Loading