Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on Device::maintain with Maintain::WaitForSubmissionIndex is ignored #3601

Open
Wumpf opened this issue Mar 17, 2023 · 1 comment · May be fixed by #5012
Open

Timeout on Device::maintain with Maintain::WaitForSubmissionIndex is ignored #3601

Wumpf opened this issue Mar 17, 2023 · 1 comment · May be fixed by #5012
Assignees
Labels
area: wsi Issues with swapchain management or windowing type: bug Something isn't working

Comments

@Wumpf
Copy link
Member

Wumpf commented Mar 17, 2023

Whenever Device::maintain with Maintain::WaitForSubmissionIndex times out, wgpu assumes that the specified submission index is done and initiates cleanup as-if the frame was done. This can cause crashes down the line as buffers may be freed too early (haven't confirmed this part, but seems shaky!).
There is a StuckGpu (on WaitIdleError and QueueSubmitError) but it is unused currently.

Wgpu needs to react to timeouts by not assuming the passed submission index is done and instead query the actual done frame.


The situation is particularly weird on WebGL where the max timeout is browser defined (and usually much lower than our hardcoded 5 seconds), see https://developer.mozilla.org/en-US/docs/Web/API/WebGL2RenderingContext/clientWaitSync, MAX_CLIENT_WAIT_TIMEOUT_WEBGL. Instead, we pass a timeout of zero, meaning that any WaitForSubmissionIndex call will pretend the frame is done. However, since freeing gl buffers that are still in use has no practical repercussions, this does not cause any crashes etc..
To make matters worse, on WebGL we can't actually block on this by calling it repeatedly, since clientWaitSync will not finish unless control is given back to the browser. Effectively meaning that we can't actually block on any particular event.

Worth noting here that blocking in this manner isn't possible at all either on WebGPU either where poll does not exist.


Given that poll is a pure Rust space method and not covered by the WebGPU spec I propose to return an enum for all these different failure modes and avoid any panic on StuckGpu, i.e. as far as possible not treating hitting the timeout as a crash but rather as an expected error state.

Wumpf added a commit to rerun-io/rerun that referenced this issue Mar 20, 2023
Reasons detailed in the comment. Polling is broken on timeout (see  gfx-rs/wgpu#3601) and timeout on WebGL is hardcoded to zero because of other issues.
Wumpf added a commit to rerun-io/rerun that referenced this issue Mar 20, 2023
Reasons detailed in the comment. Polling is broken on timeout (see  gfx-rs/wgpu#3601) and timeout on WebGL is hardcoded to zero because of other issues.
@teoxoy teoxoy added type: bug Something isn't working area: wsi Issues with swapchain management or windowing labels Apr 24, 2023
@teoxoy teoxoy added this to the WebGPU Specification V1 milestone Jun 28, 2023
@Wumpf Wumpf removed this from the WebGPU Specification V1 milestone Dec 9, 2023
@cwfitzgerald
Copy link
Member

Confirmed this can cause early buffer frees in #5000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: wsi Issues with swapchain management or windowing type: bug Something isn't working
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

3 participants