Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Multi-threaded EGL Context Access #1729

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

zicklag
Copy link
Contributor

@zicklag zicklag commented Jul 27, 2021

Connections
#1630, bevyengine/bevy#841

Description
Implements the synchronization necessary to use the GL backend from multiple threads. Accomplishes this by using a mutex around the GL context with extra wrapping to bind and unbind the EGL context when locking and unlocking.

Testing
Tested on Ubunty 20.04 with a fork of the Bevy game engine and the WGPU examples ( not that the examples test the multi-threading ).

Remaining Issues

There is only one Bevy example I cannot get to run yet and it's the load_gltf example. It fails with a shader translation error:

Jul 26 20:36:50.949 ERROR naga::back::glsl: Conflicting samplers for _group_3_binding_10    
Jul 26 20:36:50.950  WARN wgpu::backend::direct: Shader translation error for stage FRAGMENT: A image was used with multiple samplers    
Jul 26 20:36:50.950  WARN wgpu::backend::direct: Please report it to https://github.com/gfx-rs/naga    
Jul 26 20:36:50.950 ERROR wgpu::backend::direct: wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
    Internal error in FRAGMENT shader: A image was used with multiple samplers

Interestingly, I think the shader in question doesn't have a group(3), binding(10) anywhere that I know of so I'm going to have to drill down a bit more and find out exactly which shader translation is failing more.

This could potentially be fixed in a separate PR. I think the rest of this PR is rather straight-forward and the fix for the error above is probably mostly unrelated to the primary changes made in this PR.

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I hate it, but seems mostly reasonable. I want to test this on stuff like rpi and android to see if it works.

@zicklag zicklag force-pushed the multi-threaded-gl branch from 5a45663 to 5a21494 Compare July 27, 2021 02:14
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

I mean, I hate it, but seems mostly reasonable.

Yeah, according to the EGL spec we can't access the context in multiple threads at a time so essentially we're stuck. 🤷‍♂️


Oh, something I forgot. I don't know how to test it, but according to the EGL spec I think the dummy pbuffer that we use as a workaround when surfaceless rendering isn't available won't work in a multi-threaded environment. You're not allowed to have multiple threads that bind the same current pbuffer I don't think, and that's what would happen if we didn't have surfaceless support and you tried to access the context from multiple threads.

We may need to try to detect that and maybe add a downlevel flag or an error message or something.


Edit: We may also want to create a multi-threaded example sometime. Currently my only test-case for this is a fork of Bevy updated for WGPU on master.

@cwfitzgerald
Copy link
Member

We already do have a multithreaded example, it's cargo test --example hello-compute test_multithreaded_compute. Additionally, the tests are multi-threaded unless --test-threads=1 is passed.

Would having thread-local pbuffers work?

I tested this PR on my intel/haswell machine using the GL tests, and got tons of errors:

[2021-07-27T06:36:21Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3009: Got an EGLSurface but no EGLContext

@jakobhellermann
Copy link
Contributor

The shader translation error is probably due to gfx-rs/naga#1137.

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach quite a bit.
Previously in gfx, we had a wrapper around GL content that would just make it current without any locks. It was quite painful.
Here, we are locking the context and guaranteeing safe access.

wgpu-hal/src/gles/device.rs Show resolved Hide resolved
wgpu-hal/src/gles/device.rs Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
@zicklag zicklag force-pushed the multi-threaded-gl branch from 5a21494 to e47c43a Compare July 27, 2021 16:46
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Would having thread-local pbuffers work?

I think so.

I tested this PR on my intel/haswell machine using the GL tests, and got tons of errors:

Does that machine support surfaceless EGL? If it doesn't, then it's probably the pbfuffer problem. I'll try out thread local pbuffers and then you could test it to see if that fixes it.

cargo test --example hello-compute test_multithreaded_compute

That one fails on master with the expected "DeviceLost" that you get when creating a buffer on a different thread. It still fails with this PR, but with a different message. Not sure why:

---- tests::test_multithreaded_compute stdout ----
thread 'tests::test_multithreaded_compute' panicked at 'UNEXPECTED TEST PASS: BACKEND', wgpu/examples/hello-compute/../../tests/common/mod.rs:299:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@cwfitzgerald
Copy link
Member

Does that machine support surfaceless EGL?

It definitely should considering it's running on mesa. Is there a particular extension I should be looking out for?

That one fails on master with the expected "DeviceLost" that you get when creating a buffer on a different thread. It still fails with this PR, but with a different message. Not sure why:

We had a test that used to fail on vulkan but now succeeds but we never marked the test to pass. #1731 fixed it.

@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Is there a particular extension I should be looking out for?

Yeah, in the info log you should see EGL context: +surfaceless:

image

@zicklag zicklag force-pushed the multi-threaded-gl branch from e47c43a to 00d1c17 Compare July 27, 2021 17:02
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

We had a test that used to fail on vulkan but now succeeds but we never marked the test to pass. #1731 fixed it.

For some reason the test still fails after rebasing.

$ WGPU_BACKEND=gl cargo test --example hello-compute test_multithreaded_compute
running 1 test
test tests::test_multithreaded_compute ... FAILED

failures:

---- tests::test_multithreaded_compute stdout ----
thread 'tests::test_multithreaded_compute' panicked at 'UNEXPECTED TEST PASS: BACKEND', wgpu/examples/hello-compute/../../tests/common/mod.rs:299:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test_multithreaded_compute

@cwfitzgerald
Copy link
Member

Oh I misread what was causing it. This is because it now passes on your machine but is still marked to fail on OpenGL. You should got to the test definition wgpu/example/hello-compute/tests.rs and un-mark it as failing.

EGL context: +surfaceless

Yeah this doesn't support that, it only supports EGL 1.4. In fact all my GL devices are only EGL 1.4. That being said, it does support EGL_KHR_surfaceless_context, so we might need to use that.

@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Oh I misread what was causing it. This is because it now passes on your machine but is still marked to fail on OpenGL. You should got to the test definition wgpu/example/hello-compute/tests.rs and un-mark it as failing.

Ah, I get it now. 👍

Yeah this doesn't support that, it only supports EGL 1.4. In fact all my GL devices are only EGL 1.4. That being said, it does support EGL_KHR_surfaceless_context, so we might need to use that.

Well if it supports the extension then we shouldn't have a problem. Are you not seeing the +surfaceless in the logs? Maybe there's something wrong with our check for the extension?

let pbuffer =
if version < (1, 5) || !display_extensions.contains("EGL_KHR_surfaceless_context") {
let attributes = [egl::WIDTH, 1, egl::HEIGHT, 1, egl::NONE];
egl.create_pbuffer_surface(display, config, &attributes)
.map(Some)
.map_err(|e| {
log::warn!("Error in create_pbuffer_surface: {:?}", e);
crate::InstanceError
})?
} else {
log::info!("\tEGL context: +surfaceless");
None
};


The shader translation error is probably due to gfx-rs/naga#1137.

That fixed it. Now all the Bevy examples work!

@zicklag zicklag force-pushed the multi-threaded-gl branch from 00d1c17 to 675592f Compare July 27, 2021 17:30
@cwfitzgerald
Copy link
Member

Yeah the check is wrong. that should be && not ||.

@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Oh, duh. 🤦‍♂️ Well, it looks like we found a way to test the pbuffer workaround. :)

I'll fix that check and push an update, then it should work for you on your EGL 1.4 devices.

Then I'll also experiment to see if I can get the pbuffer workaround working by adding || true to that check so I can pretend I don't have the required version/extensions on my machine.

@zicklag zicklag force-pushed the multi-threaded-gl branch 2 times, most recently from f4db755 to af23509 Compare July 27, 2021 18:36
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

OK I fixed the surfaceless check and I also fixed the error you were getting earlier with "Got an EGLSurface but no EGLContext". When I made the context not current I was not supposed to bind the pbuffer as the current surface.

Also the pbuffer workaround works totally fine with multi-threading so we won't need thread-local pbuffers or anything. I was misinterpreting when we would need to bind the pbuffer. Since we only bind the pbuffer when we bind the context, everything is fine and it is only ever bound by one thread.

If this works on your hardware @cwfitzgerald, I think this should be ready ( pending re-review ).

@zicklag zicklag requested review from cwfitzgerald and kvark July 27, 2021 18:47
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/queue.rs Outdated Show resolved Hide resolved
Implements the synchronization necessary to use the GL backend from
multiple threads.
@zicklag zicklag force-pushed the multi-threaded-gl branch from af23509 to 671e393 Compare July 27, 2021 19:02
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Pushed an update!

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pending on @cwfitzgerald

@cwfitzgerald
Copy link
Member

Alright, so all tests pass as long as I pass in --test-threads=1. I think the problem may be stemming from having multiple adapters open at once. I need to test my raspberry pi, but I think that is an issue we can deal with in a follow up PR.

@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Alright, so all tests pass as long as I pass in --test-threads=1.

Interesting. All the tests pass for me, even without the --test-threads=1. I wonder what the difference is.

I need to test my raspberry pi, but I think that is an issue we can deal with in a follow up PR.

👍

@kvark
Copy link
Member

kvark commented Jul 27, 2021

uh, isn't this a bit worrying? The whole point of the PR is to support more than one thread.
Would you have energy to investigate? My time is getting low now since the move is nigh...

@cwfitzgerald
Copy link
Member

The whole point of the PR is to support more than one thread.

It does, the multi-threaded example works perfectly fine, it's juggling multiple adapters which is the issue (which is what the test harness is doing without --test-threads=1). So I'm comfortable trying to deal with this in a follow up.

Would you have energy to investigate?

Yeah ofc!

@kvark
Copy link
Member

kvark commented Jul 27, 2021

ok, sounds reasonable
bors r=cwfitzgerald,kvark

@bors
Copy link
Contributor

bors bot commented Jul 27, 2021

@bors bors bot merged commit 451cd21 into gfx-rs:master Jul 27, 2021
@zicklag zicklag deleted the multi-threaded-gl branch July 27, 2021 19:52
@zicklag
Copy link
Contributor Author

zicklag commented Jul 27, 2021

Awesome! Thanks guys! I'm super excited to actually have gotten Bevy running on OpenGL. 😃

@cwfitzgerald
Copy link
Member

Thank you for all the help getting it working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants