Skip to content

Comments

[DFT] Synced mklcpu backend + Real_Real tests#288

Merged
anantsrivastava30 merged 67 commits intouxlfoundation:developfrom
anantsrivastava30:synced_mklcpu_backend
Apr 28, 2023
Merged

[DFT] Synced mklcpu backend + Real_Real tests#288
anantsrivastava30 merged 67 commits intouxlfoundation:developfrom
anantsrivastava30:synced_mklcpu_backend

Conversation

@anantsrivastava30
Copy link
Contributor

@anantsrivastava30 anantsrivastava30 commented Mar 6, 2023

Description

This PR enables the mkl_cpu backend for the FFT domain to be introduced in #259. This enables the DFT domain to support CPU device conforming to the Spec OneAPI.
Supports COMPLEX_STORAGE=COMPLEX_COMPLEX, REAL_REAL for COMPLEX transforms;

REAL STORAGE is not used anymore and instead CONJUGATE_EVEN_STORAGE is used for REAL transforms,

Tested with:

Checklist

All Submissions

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?

New interfaces

  • Have you provided motivation for adding a new feature as part of RFC and
    it was accepted? # (RFC)
  • What version of oneAPI spec the interface is targeted?
  • Complete New features checklist

New features

  • Have you provided motivation for adding a new feature?
  • Have you added relevant tests?

Bug fixes

  • Have you added relevant regression tests?
  • Have you included information on how to reproduce the issue (either in a
    GitHub issue or in this PR)?

@anantsrivastava30 anantsrivastava30 mentioned this pull request Mar 6, 2023
9 tasks
@anantsrivastava30
Copy link
Contributor Author

CPU descriptor and compute tests :
CPUTest.log

@anantsrivastava30 anantsrivastava30 changed the title Synced mklcpu backend [DFT] Synced mklcpu backend + Real_Real tests Mar 6, 2023
@hjabird
Copy link
Contributor

hjabird commented Mar 14, 2023

You're missing something similar to compute_signatures.cpp in the MKLGPU backend.

@anantsrivastava30
Copy link
Contributor Author

You're missing something similar to compute_signatures.cpp in the MKLGPU backend.

so compute signature is alreday a part of this work and was introduced in the previous PR's that have been merged.

Comment on lines 87 to 97
MKL_LONG status[2] = { DFTI_BAD_DESCRIPTOR };

for (auto dir : { DIR::fwd, DIR::bwd })
status[dir] = DftiCommitDescriptor(bidir_handle_obj[0][dir]);

if (status[0] != DFTI_NO_ERROR && status[1] != DFTI_NO_ERROR) {
std::string err = std::string("DftiCommitDescriptor failed with status : ") +
DftiErrorMessage(status[0]) + std::string(", ") +
DftiErrorMessage(status[1]);
throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MKL_LONG status[2] = { DFTI_BAD_DESCRIPTOR };
for (auto dir : { DIR::fwd, DIR::bwd })
status[dir] = DftiCommitDescriptor(bidir_handle_obj[0][dir]);
if (status[0] != DFTI_NO_ERROR && status[1] != DFTI_NO_ERROR) {
std::string err = std::string("DftiCommitDescriptor failed with status : ") +
DftiErrorMessage(status[0]) + std::string(", ") +
DftiErrorMessage(status[1]);
throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err);
}
for (auto dir : { DIR::fwd, DIR::bwd }) {
auto status = DftiCommitDescriptor(bidir_handle_obj[0][dir]);
if (status != DFTI_NO_ERROR) {
std::string err = std::string("DftiCommitDescriptor failed with status : ") +
DftiErrorMessage(status);
throw oneapi::mkl::exception("dft/backends/mklcpu", "commit", err);
}
}

I'm not sure we should continue trying after the first one failed

Copy link
Contributor Author

@anantsrivastava30 anantsrivastava30 Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this cannot be done, as in case of real batched transformation, where resetting strides are essential, the 2nd descriptor would be of inconsistent configuration (because of the strides) and would error out.
the first one can be used for compute_forward, then we recommit and reset the strides, which invalidates the first descriptor and commits the second one.

@hjabird the reason we can't use the same logic to reset the strides is to maintain consistency with the gpu backend (which as it does not have 2 different descriptors does not have that capability)

Copy link
Contributor

@hjabird hjabird Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a tricky issue.

Best I can suggest is that if 1 of the 2 fail, we save the exception and throw it if the user tries to use that. I'm happy for that to be another PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very awkward problem. Please add a comment describing this to your solution and also remember to revert the change from && to || if needed.

Copy link
Contributor

@lhuot lhuot Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a similar issue in the cuFFT backend (I added a comment there) and forcing the two descriptors to be valid at the same time might result in failure to commit for valid real FFT problems.

One idea, @raphael-egan, @hparks-intel please let me know if you think of a case this would not work.
Can we try to commit the fwd descriptor with the given input/output stride, if successful commit the bwd descriptor with swapped input/output strides and check that both descriptor are valid?
Similarly if the first commit for forward descriptor fails, try again to commit fwd descriptor with swapped input/output strides and commit backward descriptor with original input/output stride. Fail to commit if both descriptors aren't valid?

Also I agree this might be better addressed in a separate PR.

We should definitely address this discrepancy in the oneMKL SYCL specification to have fwd and bwd strides instead of input and output stride.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a bad idea to swap the input/output strides since they may be designed for one direction but valid in both (NB I'm not 100% sure of this since it's 6:30pm on a Friday). I think a scenario where the data is densely packed in the backward domain but has spaces between the data in the complex domain might do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhuot from my reading, you don't want any changes now so I will resolve this thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a fairly awkward issue that underlines that there is room for major improvement to the specs' and product's APIs: the possibility of having a single set of configuration parameters unambiguously usable for either compute direction would require the {IN,OUT}PUT_STRIDES configuration parameters to be deprecated for the SYCL APIs and substituted by (new, yet-to-be-approved-and-implemented) {F,B}WD_STRIDES parameters. This is suggested in MKLD-15060.

As @FMarno points out, there might be cases where either direction is valid with sufficient padding. Consider a 4x4 real transforms with CCE storage, with input strides {5, 1, 0} and output strides {4,. 1, 0}, the user may be configuring the descriptor for a rfo4:5:4:x4:1:1 (padding of 1 real data point in fwd domain and 1 complex data point in bwd domain) or for a rbo4:5:4:x4:1:1 (padding of 2 complex data points in bwd domain, no padding in fwd domain) and either would be valid. Therefore, swapping strides for the bwd descriptor if the fwd descriptor committed successfully would result in incorrectness if the user actually meant to target rbo4:5:4x4:1:1 in that example.

In my personal opinion, it is best to not swap the strides and allow for one (not both!) of the descriptors to fail to commit and check thereafter that it is never used if it didn't commit successfully prior.

Copy link
Contributor

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing since there has been a lot of change since the original review.

Only minor comments/questions!

Copy link
Contributor

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few trivial compiler warnings that I think should be sorted out. I've only highlighted one here.

anantsrivastava30 and others added 2 commits April 20, 2023 10:11
list(APPEND ONEMKL_LIBRARIES_${domain} onemkl_${domain}_mklgpu)

set(ONEMKL_LIBRARIES_${domain} "")
if(domain STREQUAL "dft")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is domain ever not dft?
That applies to a lot of this file.
Otherwise it looks a lot better :)

@FMarno
Copy link
Contributor

FMarno commented Apr 27, 2023

@anantsrivastava30 can you address the merge conflict please.

Copy link

@raphael-egan raphael-egan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks

Copy link
Contributor

@FMarno FMarno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you attach a test log and format these files before merging please:
src/dft/backends/mklcpu/backward.cpp
src/dft/backends/mklcpu/commit.cpp
src/dft/backends/mklcpu/commit_derived_impl.hpp
src/dft/backends/mklcpu/forward.cpp

@anantsrivastava30 anantsrivastava30 merged commit 4c7e7eb into uxlfoundation:develop Apr 28, 2023
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
* FFT_MKLCPU_staging + sync uxlfoundation#261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* command group handles only the DftiCommitDescriptor

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* clang-format

* fix ct adding both backends to the linkline

* FFT_MKLCPU_staging + sync uxlfoundation#261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* minor changes and error checks

* command group handles only the DftiCommitDescriptor

* move queue to pass by reference

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* fix ct adding both backends to the linkline

* address reviews

* move the derived class to its own file

* address warnings, and change buffer type

* Update src/dft/backends/mklcpu/commit.cpp

* resovle dynamic dispatch issue

* merge cmake for both device

* remove bad file

* removed old comment

* clang-format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants