-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Used fma in linsequence_affine kernel #1034
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1034/index.html |
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_15 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_16 ran successfully. |
afcccfd
to
5a126fd
Compare
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
@npolina4 I fixed the issue by using |
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully. |
This PR changes to
_tensor_impl
to usesycl::fma
function to work-around aggressive compiler optimizations reordering multiplications and causing overflows. This could be addressed by applying-fno-associative-math
flag (See https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior for how to control FP-behavior in clang), which help to address the issue on Linux, but not Windows.This fixes output of
dpt.linspace(dpt.finfo('f4').max, dpt.finfo('f4').max, num=16, dtype='f4')
which unexpectedly containednan
values as discovered by @npolina4