[DFT] Synced mklcpu backend + Real_Real tests#288
[DFT] Synced mklcpu backend + Real_Real tests#288anantsrivastava30 merged 67 commits intouxlfoundation:developfrom
Conversation
|
CPU descriptor and compute tests : |
eb716f2 to
1150033
Compare
examples/dft/compile_time_dispatching/complex_fwd_buffer_mklcpu.cpp
Outdated
Show resolved
Hide resolved
examples/dft/compile_time_dispatching/complex_fwd_buffer_mklcpu.cpp
Outdated
Show resolved
Hide resolved
examples/dft/compile_time_dispatching/complex_fwd_buffer_mklcpu.cpp
Outdated
Show resolved
Hide resolved
examples/dft/compile_time_dispatching/complex_fwd_buffer_mklcpu.cpp
Outdated
Show resolved
Hide resolved
|
You're missing something similar to |
tests/unit_tests/dft/include/compute_out_of_place_real_real.hpp
Outdated
Show resolved
Hide resolved
tests/unit_tests/dft/include/compute_out_of_place_real_real.hpp
Outdated
Show resolved
Hide resolved
tests/unit_tests/dft/include/compute_out_of_place_real_real.hpp
Outdated
Show resolved
Hide resolved
tests/unit_tests/dft/include/compute_out_of_place_real_real.hpp
Outdated
Show resolved
Hide resolved
so compute signature is alreday a part of this work and was introduced in the previous PR's that have been merged. |
src/dft/backends/mklcpu/commit.cpp
Outdated
| MKL_LONG status[2] = { DFTI_BAD_DESCRIPTOR }; | ||
|
|
||
| for (auto dir : { DIR::fwd, DIR::bwd }) | ||
| status[dir] = DftiCommitDescriptor(bidir_handle_obj[0][dir]); | ||
|
|
||
| if (status[0] != DFTI_NO_ERROR && status[1] != DFTI_NO_ERROR) { | ||
| std::string err = std::string("DftiCommitDescriptor failed with status : ") + | ||
| DftiErrorMessage(status[0]) + std::string(", ") + | ||
| DftiErrorMessage(status[1]); | ||
| throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err); | ||
| } |
There was a problem hiding this comment.
| MKL_LONG status[2] = { DFTI_BAD_DESCRIPTOR }; | |
| for (auto dir : { DIR::fwd, DIR::bwd }) | |
| status[dir] = DftiCommitDescriptor(bidir_handle_obj[0][dir]); | |
| if (status[0] != DFTI_NO_ERROR && status[1] != DFTI_NO_ERROR) { | |
| std::string err = std::string("DftiCommitDescriptor failed with status : ") + | |
| DftiErrorMessage(status[0]) + std::string(", ") + | |
| DftiErrorMessage(status[1]); | |
| throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err); | |
| } | |
| for (auto dir : { DIR::fwd, DIR::bwd }) { | |
| auto status = DftiCommitDescriptor(bidir_handle_obj[0][dir]); | |
| if (status != DFTI_NO_ERROR) { | |
| std::string err = std::string("DftiCommitDescriptor failed with status : ") + | |
| DftiErrorMessage(status); | |
| throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err); | |
| } | |
| } |
I'm not sure we should continue trying after the first one failed
There was a problem hiding this comment.
no this cannot be done, as in case of real batched transformation, where resetting strides are essential, the 2nd descriptor would be of inconsistent configuration (because of the strides) and would error out.
the first one can be used for compute_forward, then we recommit and reset the strides, which invalidates the first descriptor and commits the second one.
@hjabird the reason we can't use the same logic to reset the strides is to maintain consistency with the gpu backend (which as it does not have 2 different descriptors does not have that capability)
There was a problem hiding this comment.
It is a tricky issue.
Best I can suggest is that if 1 of the 2 fail, we save the exception and throw it if the user tries to use that. I'm happy for that to be another PR though.
There was a problem hiding this comment.
That is a very awkward problem. Please add a comment describing this to your solution and also remember to revert the change from && to || if needed.
There was a problem hiding this comment.
I think there is a similar issue in the cuFFT backend (I added a comment there) and forcing the two descriptors to be valid at the same time might result in failure to commit for valid real FFT problems.
One idea, @raphael-egan, @hparks-intel please let me know if you think of a case this would not work.
Can we try to commit the fwd descriptor with the given input/output stride, if successful commit the bwd descriptor with swapped input/output strides and check that both descriptor are valid?
Similarly if the first commit for forward descriptor fails, try again to commit fwd descriptor with swapped input/output strides and commit backward descriptor with original input/output stride. Fail to commit if both descriptors aren't valid?
Also I agree this might be better addressed in a separate PR.
We should definitely address this discrepancy in the oneMKL SYCL specification to have fwd and bwd strides instead of input and output stride.
There was a problem hiding this comment.
I think it might be a bad idea to swap the input/output strides since they may be designed for one direction but valid in both (NB I'm not 100% sure of this since it's 6:30pm on a Friday). I think a scenario where the data is densely packed in the backward domain but has spaces between the data in the complex domain might do it.
There was a problem hiding this comment.
@lhuot from my reading, you don't want any changes now so I will resolve this thread.
There was a problem hiding this comment.
This is indeed a fairly awkward issue that underlines that there is room for major improvement to the specs' and product's APIs: the possibility of having a single set of configuration parameters unambiguously usable for either compute direction would require the {IN,OUT}PUT_STRIDES configuration parameters to be deprecated for the SYCL APIs and substituted by (new, yet-to-be-approved-and-implemented) {F,B}WD_STRIDES parameters. This is suggested in MKLD-15060.
As @FMarno points out, there might be cases where either direction is valid with sufficient padding. Consider a 4x4 real transforms with CCE storage, with input strides {5, 1, 0} and output strides {4,. 1, 0}, the user may be configuring the descriptor for a rfo4:5:4:x4:1:1 (padding of 1 real data point in fwd domain and 1 complex data point in bwd domain) or for a rbo4:5:4:x4:1:1 (padding of 2 complex data points in bwd domain, no padding in fwd domain) and either would be valid. Therefore, swapping strides for the bwd descriptor if the fwd descriptor committed successfully would result in incorrectness if the user actually meant to target rbo4:5:4x4:1:1 in that example.
In my personal opinion, it is best to not swap the strides and allow for one (not both!) of the descriptors to fail to commit and check thereafter that it is never used if it didn't commit successfully prior.
hjabird
left a comment
There was a problem hiding this comment.
Re-reviewing since there has been a lot of change since the original review.
Only minor comments/questions!
examples/dft/compile_time_dispatching/complex_fwd_buffer_mklcpu.cpp
Outdated
Show resolved
Hide resolved
hjabird
left a comment
There was a problem hiding this comment.
There are quite a few trivial compiler warnings that I think should be sorted out. I've only highlighted one here.
Co-authored-by: Finlay <finlay.marno@codeplay.com>
| list(APPEND ONEMKL_LIBRARIES_${domain} onemkl_${domain}_mklgpu) | ||
|
|
||
| set(ONEMKL_LIBRARIES_${domain} "") | ||
| if(domain STREQUAL "dft") |
There was a problem hiding this comment.
is domain ever not dft?
That applies to a lot of this file.
Otherwise it looks a lot better :)
|
@anantsrivastava30 can you address the merge conflict please. |
FMarno
left a comment
There was a problem hiding this comment.
LGTM. Can you attach a test log and format these files before merging please:
src/dft/backends/mklcpu/backward.cpp
src/dft/backends/mklcpu/commit.cpp
src/dft/backends/mklcpu/commit_derived_impl.hpp
src/dft/backends/mklcpu/forward.cpp
* FFT_MKLCPU_staging + sync uxlfoundation#261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * command group handles only the DftiCommitDescriptor * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * clang-format * fix ct adding both backends to the linkline * FFT_MKLCPU_staging + sync uxlfoundation#261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * minor changes and error checks * command group handles only the DftiCommitDescriptor * move queue to pass by reference * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * fix ct adding both backends to the linkline * address reviews * move the derived class to its own file * address warnings, and change buffer type * Update src/dft/backends/mklcpu/commit.cpp * resovle dynamic dispatch issue * merge cmake for both device * remove bad file * removed old comment * clang-format
Description
This PR enables the mkl_cpu backend for the FFT domain to be introduced in #259. This enables the DFT domain to support CPU device conforming to the Spec OneAPI.
Supports
COMPLEX_STORAGE=COMPLEX_COMPLEX, REAL_REALfor COMPLEX transforms;REAL STORAGE is not used anymore and instead CONJUGATE_EVEN_STORAGE is used for REAL transforms,
Tested with:
Checklist
All Submissions
New interfaces
it was accepted? # (RFC)
New features
Bug fixes
GitHub issue or in this PR)?