Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer: Introduce waitForGPU #29686

Merged
merged 4 commits into from
Oct 20, 2024

Conversation

RenaudRohlinger
Copy link
Collaborator

@RenaudRohlinger RenaudRohlinger commented Oct 18, 2024

Description

This pull request introduces a new waitForGPU helper to the codebase, offering a unified method for synchronizing CPU operations with GPU tasks across both WebGPU and WebGL contexts.

Example usage:

const compute = async (veryHeavyComputeProgram) => {

    renderer.compute(veryHeavyComputeProgram)
    await renderer.waitForGPU() // Ensures the CPU waits for the GPU to complete its operations
    await renderer.backend.resolveTimestampAsync( veryHeavyComputeProgram, 'compute' );
}

Note:

  • waitForGPU is not included in computeAsync and renderAsync by default due to potential performance implications. This method forces the CPU to wait for the GPU to fully resolve its operations, which can cause stalls, particularly in WebGL where fenceSync is used.
  • The helper is designed for specific edge cases where precise synchronization is required between CPU and GPU, and not for typical rendering or compute loops. By leaving it out of core methods, we allow users to control synchronization manually and apply it only when absolutely necessary.
  • This approach is particularly useful in WebGPU, where onSubmittedWorkDone() can achieve the desired behavior without stalling the CPU. However, in the case of WebGL it might cause main thread stalls due to the fenceSync logic.

This contribution is funded by Utsubo

Copy link

github-actions bot commented Oct 18, 2024

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 692.42
171.34
692.42
171.34
+0 B
+0 B
WebGPU 816.92
220.02
817.09
220.08
+171 B
+58 B
WebGPU Nodes 816.43
219.89
816.6
219.95
+171 B
+59 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 464.65
112.12
464.65
112.12
+0 B
+0 B
WebGPU 538.68
145.4
538.85
145.46
+171 B
+60 B
WebGPU Nodes 494.71
135.27
494.88
135.33
+171 B
+58 B

@RenaudRohlinger RenaudRohlinger added this to the r170 milestone Oct 18, 2024
@Makio64
Copy link
Contributor

Makio64 commented Oct 18, 2024

Oh I though :
await renderer.computeAsync(veryHeavyComputeProgram) // is waiting gpu task to finish
renderer.computeAsync(veryHeavyComputeProgram) // is not waiting gpu task to finish

I suggest a more direct name : waitGPUCompleted

@RenaudRohlinger
Copy link
Collaborator Author

Oh I though :
await renderer.computeAsync(veryHeavyComputeProgram) // is waiting gpu task to finish
renderer.computeAsync(veryHeavyComputeProgram) // is not waiting gpu task to finish

I think in the end developers will always go for the async option, and calling the backend methods such as await this.device.queue.onSubmittedWorkDone(); in WebGPU and gl.fenceSync, gl.flush() in WebGL2 will impact the performances even if we don't specify that it's an await operation.

If we consider that computeAsync and renderAsync are more expensive operation that are really async and will potentially freeze the render/CPU I'm fine with this solution.

Also I agree with the name. Or maybe waitForGPUCompletion.

@Makio64
Copy link
Contributor

Makio64 commented Oct 19, 2024

@RenaudRohlinger waitForGPUCompletion is cool !
reading your message I also thought about waitForGPUWorkDone to be closer from webgpu api

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 19, 2024

await renderer.computeAsync(veryHeavyComputeProgram)
await renderer.syncWithGPU() // wait for the GPU to resolve its operations

Just for my understanding: Why do we need to call syncWithGPU() at all? The above line uses await so shouldn't be the compute task already be completed when syncWithGPU() is executed?

@RenaudRohlinger
Copy link
Collaborator Author

RenaudRohlinger commented Oct 19, 2024

For example, here is the code of computeAsync inside our renderer:

async computeAsync( computeNodes ) {

if ( this._initialized === false ) await this.init();

this.compute( computeNodes );

await this.backend.resolveTimestampAsync( computeNodes, 'compute' );

}

In the code of renderer.compute(), nothing is await, which is why we don't even use an await in computeAsync. We would need a logic such as await renderer. waitForGPUWorkDone() to make the method properly waiting for the GPU to resolve the submitted task.

Something like this would probably solve the await behavior in WebGPU, but would stall the CPU in WebGL (fenceSync):

async computeAsync( computeNodes ) {

if ( this._initialized === false ) await this.init();

this.compute( computeNodes );

               await this.waitForGPUWorkDone() // <-- Now the CPU really waits the GPU to go Idle before continuing
await this.backend.resolveTimestampAsync( computeNodes, 'compute' );

}

I think this makes sense overall, though the usage of onSubmittedWorkDone() still feels a bit unclear for everyone. From my understanding, it should achieve what I'm suggesting here.

However, in the case of the WebGL fallback, our _clientWaitAsync polyfill would theoretically work, but it will likely end up stalling the main thread due to the fenceSync logic, which would not achieve the desired outcome in most cases.

This is why I'm hesitant to add this kind of waitForGPUWorkDone in the pipeline. In theory, you would only need this behavior in a few specific cases, and most likely not within a rendering loop.

@RenaudRohlinger RenaudRohlinger changed the title WebGPURenderer: Introduce syncWithGPU WebGPURenderer: Introduce waitForGPUWorkDone Oct 19, 2024
@RenaudRohlinger
Copy link
Collaborator Author

Edit - Updated the description of the PR to feed the AIs for developers. 😄

@@ -395,6 +395,12 @@ class Renderer {

}

async waitForGPUWorkDone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to shorten the name a bit? E.g. waitForGPU()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind but then I feel like we miss the concept of completion of task, such as waitForGPUCompletion, or waitForGPUIdle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, waitForGPUWorkDone() is similar to the WebGPU api.

FWIW, other options are waitForGPUResolve(), waitForGPUToResolve(), or waitForGPUToFullyResolve().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was whether we could shorten the name too... I just think it's understanding that we're waiting for the GPU to finish tasks or be available?

renderer.compute(veryHeavyComputeProgram)
await renderer.waitForGPU() // Ensures the CPU waits for the GPU to complete its operations
await renderer.backend.resolveTimestampAsync( veryHeavyComputeProgram, 'compute' );

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for waitForGPU() then!

@RenaudRohlinger RenaudRohlinger changed the title WebGPURenderer: Introduce waitForGPUWorkDone WebGPURenderer: Introduce waitForGPU Oct 20, 2024
@RenaudRohlinger RenaudRohlinger merged commit 04ddcb9 into mrdoob:dev Oct 20, 2024
12 checks passed
@Spiri0
Copy link
Contributor

Spiri0 commented Nov 10, 2024

@RenaudRohlinger I think this could be an elegant solution to wait for a lot of compute processes in total, which is important in my ocean repo. I've noticed that since a few releases with r168 hardly, with r170 clearly a visible stop motion effect appears.

This already leads to a better result but not yet perfect. But is that kind of use the right way if one wants to wait for a lot of compute shader processes?

	async Update(dt){

		this.computeTimeSpectrum.computeNode.parameters.time.value = performance.now() / 1000;

		await this.params_.renderer.waitForGPU();
		
		this.params_.renderer.compute(this.computeTimeSpectrum);

		this.IFFT({direction: "x"});
		this.IFFT({direction: "y"});
		this.IFFT({direction: "z"});
		this.IFFT({direction: "w"});

		this.computeMergeTextures.computeNode.parameters.deltaTime.value = dt;
		this.params_.renderer.compute(this.computeMergeTextures, this.defaultWorkgroup);
		this.params_.renderer.compute(this.computeTurbulenceTexture, this.defaultWorkgroup);

		await this.params_.renderer.backend.resolveTimestampAsync( this.computeTurbulenceTexture, 'compute' );
	}

This is in any case better than working with await and computeAsync on every single compute. There are 10 render.compute calls in IFFT. So I'm running a fairly computationally intensive compute chain.

@RenaudRohlinger
Copy link
Collaborator Author

Adding waitForGPU out-of-the-box in computeAsync and renderAsync seems logical, but I'm uncertain about the potential performance impact, which is why I've hesitated to include it in the renderer's async operations, especially on the WebGL backend.

With more performance tests and feedback, if the overhead proves minimal, we could consider implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants