Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used fma in linsequence_affine kernel #1034

Merged
merged 2 commits into from
Jan 12, 2023
Merged

Conversation

oleksandr-pavlyk
Copy link
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk commented Jan 10, 2023

This PR changes to _tensor_impl to use sycl::fma function to work-around aggressive compiler optimizations reordering multiplications and causing overflows. This could be addressed by applying -fno-associative-math flag (See https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior for how to control FP-behavior in clang), which help to address the issue on Linux, but not Windows.

This fixes output of dpt.linspace(dpt.finfo('f4').max, dpt.finfo('f4').max, num=16, dtype='f4') which unexpectedly contained nan values as discovered by @npolina4

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?

@github-actions
Copy link

@coveralls
Copy link
Collaborator

coveralls commented Jan 10, 2023

Coverage Status

Coverage: 82.23% (+0.04%) from 82.191% when pulling 5a126fd on use-no-associative-math into 47e4ae4 on master.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully.
Passed: 33
Failed: 801
Skipped: 280

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_15 ran successfully.
Passed: 33
Failed: 801
Skipped: 280

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_16 ran successfully.
Passed: 33
Failed: 801
Skipped: 280

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully.
Passed: 33
Failed: 801
Skipped: 280

@oleksandr-pavlyk
Copy link
Collaborator Author

@npolina4 I fixed the issue by using sycl::fma to prevent compiler from reordering terms in multiplication and creating sub-expressions prone to overflowing.

@oleksandr-pavlyk oleksandr-pavlyk merged commit 6364c08 into master Jan 12, 2023
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.1dev1=py310h76be34b_14 ran successfully.
Passed: 33
Failed: 801
Skipped: 280

@oleksandr-pavlyk oleksandr-pavlyk deleted the use-no-associative-math branch January 12, 2023 17:59
@oleksandr-pavlyk oleksandr-pavlyk changed the title Use no associative math Used fma in linsequence_affine kernel Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants