-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Possible scheduler 2.0 deadlock with write buffer sync #143
Comments
@agozillon We are aware of the hang problem when host accessor is created while another host accessor is still alive, hang happens during construction of the second host accessor. It seems you have faced exactly the same issue. It should hang on Intel's GPU device, but work fine on Intel's CPU device. |
Ah that's great that your working on a fix, thank you very much! I originally thought it was a problem on our end (it deadlocks with our runtime as well), but noticed it was happening on an unmodified build as well. Although, I appear to be having it occur on CPU devices, unless I am misunderstanding something (which is quite possible). When I query the device, the SYCL runtime categorizes it as a CPU. I also do not use the Intel Compute runtime, just the experimental CPU runtime (I've tried with both the SYCL modified variation stated in GetStartedWithSYCLCompiler.md and the released version on the Intel website). Thanks for the quick answer as always. |
Ohh, sorry, there is another problem in your example: we block all operations with buffer while the host accessor is alive. From the Spec(3.6.5.1): |
Ah thank you very much, for the information! So it makes sense and is correct to have a deadlock from the specifications standpoint (I'll try to be a little more thorough with the specification in the future, my apologies). It worked in triSYCL and the previous scheduler, so I incorrectly assumed it was legal. @keryell has also recently informed me that it's a bug in the existing triSYCL implementation. I am happy for this issue to be closed, if you wish to do so. |
Yes, I think that Intel implementation is more correct than triSYCL, but still too conservative for me. In triSYCL: triSYCL/triSYCL#190 Some (private...) discussions inside the committee: https://gitlab.khronos.org/sycl/Specification/issues/154, https://gitlab.khronos.org/sycl/Specification/issues/174 we need to clarify for SYCL 2019. Probably the answers should come from @jeffhammond, @tgmattso, @jcownie-intel... Spoiler alert: I cannot see any reason we should deviate from the usual RAR, RAW, WAR & WAW dependencies used for at least 60 years of parallel programming, especially if we spent a lot of time in SYCL to express of all these with the accessors... :-) |
@keryell Could you, please, clarify why Intel implementation is too conservative? int main() {
cl::sycl::queue q;
cl::sycl::buffer<int, 1> ob((int[1]){0}, 1);
q.submit([&](handler &cgh) {
auto wb = ob.get_access<access::mode::read_write>(cgh);
cgh.single_task<class k1>([=]() {
wb[0] += 1;
});
});
auto rb = ob.get_access<access::mode::read>();
std::cout << rb[0] << "\n";
q.submit([&](handler &cgh) {
//auto wb = ob.get_access<access::mode::read_write>(cgh);
auto wb = ob.get_access<access::mode::read>(cgh);
cgh.single_task<class k2>([=]() {
//wb[0] += 1;
(void)wb[0];
});
});
auto rb2 = ob.get_access<access::mode::read>();
std::cout << rb2[0] << "\n";
return 0;
} The code above will be lowered to the following OCL API calls:
|
Great! So you have implemented what I think has to be implemented, then. :-) |
@agozillon if you are good with this, I guess you can close this. |
So I have two example snippets of code that write to a single value in a
buffer
twice using two kernels and use theaccessor
functionality to read on host and write on device (I believe both snippets are legal SYCL code, but please do correct me if I am wrong and making some incorrect assumptions).The first doesn't work but the second does, the only difference (from a user perspective) is the braces
{ }
around the submit calls, which I believe forces a wait/synchronization event in SYCL (perhaps I am misunderstanding however). They both work with the old scheduler when-DSCHEDULER_10
is passed to the compiler, which leads me to think that it's less the legality of the two examples and more some incorrect synchronization event.Tested with following command and unaltered top of the tree (as of May 14th):
$ISYCL_BIN_DIR/clang++ -std=c++11 -fsycl scheduler_2_buffer_block.cpp -o scheduler_2_buffer_block -lOpenCL
I tinkered with this for a while, from what I've found:
get_access
inside the second kernel is just a readaccessor
, it won't block.clEnqueueUnmapMemObject
invocation frommemory_manager.cpp
(can comment out the contents ofunmap
and the non-working snippet should work).get_access
with aqueue
wait
and it'll still blockBefore I decide to dig any deeper I thought it might be worth finding out if this is a bug or a misconception/silliness on my end and if you guys are already aware and working on it!
Invalid, blocks when trying to wait for second kernel submit:
Valid, no block:
The text was updated successfully, but these errors were encountered: