Timeout on `Device::maintain` with `Maintain::WaitForSubmissionIndex` is ignored #3601

Wumpf · 2023-03-17T15:07:19Z

Whenever Device::maintain with Maintain::WaitForSubmissionIndex times out, wgpu assumes that the specified submission index is done and initiates cleanup as-if the frame was done. This can cause crashes down the line as buffers may be freed too early (haven't confirmed this part, but seems shaky!).
There is a StuckGpu (on WaitIdleError and QueueSubmitError) but it is unused currently.

Wgpu needs to react to timeouts by not assuming the passed submission index is done and instead query the actual done frame.

The situation is particularly weird on WebGL where the max timeout is browser defined (and usually much lower than our hardcoded 5 seconds), see https://developer.mozilla.org/en-US/docs/Web/API/WebGL2RenderingContext/clientWaitSync, MAX_CLIENT_WAIT_TIMEOUT_WEBGL. Instead, we pass a timeout of zero, meaning that any WaitForSubmissionIndex call will pretend the frame is done. However, since freeing gl buffers that are still in use has no practical repercussions, this does not cause any crashes etc..
To make matters worse, on WebGL we can't actually block on this by calling it repeatedly, since clientWaitSync will not finish unless control is given back to the browser. Effectively meaning that we can't actually block on any particular event.

Worth noting here that blocking in this manner isn't possible at all either on WebGPU either where poll does not exist.

Given that poll is a pure Rust space method and not covered by the WebGPU spec I propose to return an enum for all these different failure modes and avoid any panic on StuckGpu, i.e. as far as possible not treating hitting the timeout as a crash but rather as an expected error state.

The text was updated successfully, but these errors were encountered:

Reasons detailed in the comment. Polling is broken on timeout (see gfx-rs/wgpu#3601) and timeout on WebGL is hardcoded to zero because of other issues.

cwfitzgerald · 2024-01-05T19:32:09Z

Confirmed this can cause early buffer frees in #5000

Wumpf mentioned this issue Mar 17, 2023

Rerun Issues rerun-io/opensource#1

Open

4 tasks

Wumpf added a commit to rerun-io/rerun that referenced this issue Mar 20, 2023

Don't call wgpu::Device::poll on the web.

574c433

Reasons detailed in the comment. Polling is broken on timeout (see gfx-rs/wgpu#3601) and timeout on WebGL is hardcoded to zero because of other issues.

Wumpf mentioned this issue Mar 20, 2023

Don't call wgpu::Device::poll on the web rerun-io/rerun#1626

Merged

1 task

Wumpf added a commit to rerun-io/rerun that referenced this issue Mar 20, 2023

Don't call wgpu::Device::poll on the web. (#1626)

86d0f10

Reasons detailed in the comment. Polling is broken on timeout (see gfx-rs/wgpu#3601) and timeout on WebGL is hardcoded to zero because of other issues.

teoxoy added type: bug Something isn't working area: wsi Issues with swapchain management or windowing labels Apr 24, 2023

teoxoy added this to the WebGPU Specification V1 milestone Jun 28, 2023

Wumpf removed this from the WebGPU Specification V1 milestone Dec 9, 2023

cwfitzgerald mentioned this issue Jan 5, 2024

Result buffer getting destroyed while required to be alive by the command buffer in long running compute shader #5000

Closed

cwfitzgerald linked a pull request Jan 8, 2024 that will close this issue

Properly Deal with Timeouts Inside device::maintain #5012

Draft

6 tasks

cwfitzgerald mentioned this issue Jul 17, 2024

[test] remove molten vk failure #5964

Closed

teoxoy added this to WebGPU for Firefox Jul 17, 2024

github-project-automation bot moved this to Todo in WebGPU for Firefox Jul 17, 2024

teoxoy mentioned this issue Jul 26, 2024

Boids Fails Validation on MoltenVK #2774

Closed

teoxoy mentioned this issue Oct 15, 2024

Wait for submissions to complete on Queue drop #6413

Merged

jimblandy assigned cwfitzgerald Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout on `Device::maintain` with `Maintain::WaitForSubmissionIndex` is ignored #3601

Timeout on `Device::maintain` with `Maintain::WaitForSubmissionIndex` is ignored #3601

Wumpf commented Mar 17, 2023 •

edited

Loading

cwfitzgerald commented Jan 5, 2024

Timeout on Device::maintain with Maintain::WaitForSubmissionIndex is ignored #3601

Timeout on Device::maintain with Maintain::WaitForSubmissionIndex is ignored #3601

Comments

Wumpf commented Mar 17, 2023 • edited Loading

cwfitzgerald commented Jan 5, 2024

Timeout on `Device::maintain` with `Maintain::WaitForSubmissionIndex` is ignored #3601

Timeout on `Device::maintain` with `Maintain::WaitForSubmissionIndex` is ignored #3601

Wumpf commented Mar 17, 2023 •

edited

Loading