-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Force instantiation of parallel_for before use of KernelInfo #3886
Conversation
With the new __builtin_sycl_unique_stable_name implementation, we need to make sure we instantiate the kernel call BEFORE we use the builtin via KernelInfo. We can do this by naming the function (but only if it is a static/global, so this also changes those functions to static). Additionally, we need to make sure we do this in the same way that parallel_for_wrapper does, otherwise we'll end up with 2 kernels instantiated with the same name.
Note that @AlexeySachkov is going to add a test to this review, so that we catch this issue more quickly. There is PERHAPS a way to extract this to a separate function/type/etc, but I believe this is the only time we need to do this, so I hope this is sufficient. @romanovvlad : WDYT? |
/summary:run |
Just noticed that some of the checks failed with this! I 'll have to double-check what is going on. That said, i just pushed a new patch that extracts the logic into its own type. This cannot be a function as that doesn't necessarily force instantiation immediately like going to a type does. |
Welp... fixed my build issue, but it seems that I consistently have 3 check-sycl tests that fail with blank-outputs, so that is new. and not particularly helpful. I'm just about at the end of my work-week though, so hopefully @romanovvlad or @AlexeySachkov can either come up with a better solution (like removing the need to use KernelInfo in this function?), or figure out the issue. SYCL::accessor.cpp |
A little more info... I was able to actually run the accessor test (though, not all of check-sycl), and get just a 'segmentation fault'. Loading in the debugger and getting the stack trace shows: #3 0x0000000000410c77 in ZN2cl4sycl6detail17NDLoopIterateImplILi1ELi0ENS0_5rangeEZNS1_10HostKernelIZZ13accessor_testIiEvNS0_5queueEmENKUlRNS0_7handlerEE_clES8_EUlNS0_4itemILi1ELb1EEEE_SB_Li1ESC_E9runOnHostISB_EENSt9enable_ifIXsr3std7is_sameIT_SB_EE5valueEvE4typeERKNS1_8NDRDescTEEUlRKNS0_2idILi1EEEE_SM_EC2ESP_RKNS3_ILi1EEESU_SQ_RSN () #4 0x0000000000410bdb in cl::sycl::detail::HostKernel<accessor_test(cl::sycl::queue, unsigned long)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::item<1, true>)#1}, cl::sycl::item, 1, cl::sycl::item<1, true> >::runOnHostcl::sycl::item () Which is of course interesting, since it seems to not have anything to do with the handler parallel_for at all! So I have no idea what it could be here... |
DISABLE_PARALLEL_FOR_RANGE_ROUNDING feature.
/summary:run |
(KName.find("SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING") != | ||
std::string::npos) || | ||
(KI::getName() == nullptr || KI::getName()[0] == '\0') || | ||
(KI::callsThisItem()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this particular change. Not calling this_item
free functions is a requirement for range rounding feature. Having all tests passed means that we don't have coverage for this.
@romanovvlad, @rdeodhar, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test SYCL/Basic/free_function_queries/free_function_queries.cpp tested this. However, for performance reasons, the heuristics for range rounding were later adjusted so that rounding is only done when the range >= 1024. The test uses a range of 10 so the kernel that uses this_item is not subject to range rounding and the test passes.
The test will need to be modified to increase the range (I will do it separately).
So removing that check from handler.hpp is not correct.
@erichkeane, I've added a few test cases we want to fix/check in 70d05d1. I think that |
Thanks! I see the build did some bad things (though only the range-rounding failing on Windows seems valid). Can you take a look? |
I've restarted those builds - they failed to checkout some sources/dependencies, i.e. infrastructure issue, not a new regression |
Linux one did, but the windows actually failed on the test: |
This failure is kind of expected, see my concern here: #3886 (comment) |
Ah, I missed that comment! I'm not sure what we can do in that case, the "KI" object is troublesome, since it is what is calling the builtin_unique_stable_name too early. |
I have an idea here: we could propagate that info through device image properties instead of using integration header. This will for sure require a bit more efforts from FE, RT and Tools, but will help avoid triggering the issue with extra kernel instantiation dur to use of It also seems a bit weird to not emit the info which is immediately available through FE-RT interface and go through properties and @romanovvlad, @erichkeane, what do you think? |
I think I agree with BOTH of your points here, which makes me fear that ends up with a worse user-experience. At the same time, it DOES fit very well in properties.... But that isn't implemented yet, right? I'm not sure what we can end up doing here to use it as a property yet. |
Also see: #3894 as a potential alternative. |
After talking with Sachkov, I think #3894 is a better solution, so closing this. |
@@ -0,0 +1,10 @@ | |||
// RUN: %clangxx -fsycl -fsycl-unnamed-lambda -fsycl-device-only -c %s -o %t.temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erichkeane, this test should probably be copied into #3894 to ensure that we have fixed the original problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, should be in the latest patch.
With the new __builtin_sycl_unique_stable_name implementation, we need
to make sure we instantiate the kernel call BEFORE we use the builtin
via KernelInfo. We can do this by naming the function (but only if it is
a static/global, so this also changes those functions to static).
Additionally, we need to make sure we do this in the same way that
parallel_for_wrapper does, otherwise we'll end up with 2 kernels
instantiated with the same name.