Skip to content

Comments

Add unit-tests for the dft interface#261

Merged
mkrainiuk merged 22 commits intouxlfoundation:developfrom
codeplaysoftware:dft-tests
Feb 15, 2023
Merged

Add unit-tests for the dft interface#261
mkrainiuk merged 22 commits intouxlfoundation:developfrom
codeplaysoftware:dft-tests

Conversation

@fabiomestre
Copy link

@fabiomestre fabiomestre commented Dec 30, 2022

Description

Adds tests for the dft interface implemented on: #259.
Note: That PR should be merged before this one.

Current tests cover the following:

  • descriptor getters / setters
  • descriptor constructor
  • descriptor commit
  • compute_forward overloads using different sizes (in 1D), precisions and domains.
  • compute_backward overloads using different sizes (in 1D), precisions and domains.

Additional tests will be added in a future PR to extend the test coverage.

Checklist

All Submissions

  • [N/A] Do all unit tests pass locally? This PR implements unit tests for another PR
  • Have you formatted the code using clang-format?

Copy link
Contributor

@FMarno FMarno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most (if not all) of the places you've used std::cout std::cerr would be more appropriate. Other than that there is some repeated code but overall looks good.

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let us know if we need any more changes to this PR?

Comment on lines 44 to 46
using InputType = typename std::conditional_t<domain == oneapi::mkl::dft::domain::REAL,
PrecisionType, std::complex<PrecisionType>>;
using OutputType = std::complex<PrecisionType>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is so specific to forward compute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We test the forward transform followed by the backward transform. Since we keep the input to the forward transform, there is little reason to compute the inverse for reference purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. What do you think of renaming them fwdInputType and fwdOutputType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I've followed your suggestion in 2fbfcb6.

PrecisionType, std::complex<PrecisionType>>;
using OutputType = std::complex<PrecisionType>;

enum class TestType { buffer, usm };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have a more specific name for this than TestType.

Copy link
Contributor

@hjabird hjabird Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to MemoryAccessModel in 2fbfcb6.


const std::int64_t size;
const std::int64_t conjugate_even_size;
static constexpr int error_margin = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a heuristic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want that margin to be proportional to the logarithm of the size of the transform?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be better to have a heuristic based on the problem size, like in the MKL examples. And I think we should have a comment that it's a heuristic, average-case error bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set the error margin based on the problem size in 75c0fc5

return 1e-12;
}
else {
return (error_mag * num_components<fp>() * std::numeric_limits<fp_real>::epsilon());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why we define the error_margin heuristic at a much higher level instead of just defining/using error_mag = 10 here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall we've followed the design from the blas domain here: https://github.com/oneapi-src/oneMKL/blob/develop/tests/unit_tests/blas/include/test_common.hpp#L441
It re-uses some of the same code but we figured this duplication helps keeping the domain separate.
We'll do the change to set the error_margin based on the problem size so it makes sense to set it at a higher level I think.

sycl::buffer<InputType, 1> inout_dev{ sycl::range<1>(container_size) };
std::vector<InputType> out_host(container_size);

copy_to_device(sycl_queue, input, inout_dev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little bit why you chose to use these explicit copies?

Copy link
Contributor

@hjabird hjabird Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a result of being influenced too much by HIP and CUDA. I've changed the buffer tests to use SYCL's implicit buffer data movement in d39fe50.

Comment on lines 61 to 62
EXPECT_TRUE(check_equal_vector(out_host.data(), out_host_ref_conjugate.data(),
out_host.size(), error_margin, std::cout));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check doesn't really make use of out_host being an std::vector. Why not use inout_dev.get_access<read>().get_pointer() instead of copying to out_host and then using out_host.data()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now uses a host accessor as suggested in d39fe50.

using ref_t = long double; /* Do the calculations using long double */
static_assert(is_complex<TypeOut>(), "Output type of DFT must be complex");

const ref_t TWOPI = ref_t{ 2.0 } * std::atan(ref_t{ 1.0 }) * ref_t{ 4.0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason you went with this calculation over a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a literal in 2fbfcb6.


const auto aerr = std::abs(x - x_ref);
const auto rerr = aerr / std::abs(x_ref);
const bool ok = (rerr <= bound) || (aerr <= bound);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same error bound on relative and absolute errors here: is that intentional? What if std::abs(x_ref) ~ 10^16?

Copy link
Contributor

@Rbiessy Rbiessy Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is intentional and following the decision from the blas domain: https://github.com/oneapi-src/oneMKL/blob/develop/tests/unit_tests/blas/include/test_common.hpp#L449
This is the issue with relative errors, they're not reliable for small inputs. In that case we rely on the absolute error to give a meaningful answer (i.e. aerr <= bound).
Using both absolute and relative errors like this is a pretty forgiving way of testing and this relates to the discussion in this comment #261 (comment). We also discussed this issue and concluded that it is not necessary for oneMKL to require a very high accuracy for all backends. Feel free to let us know your thoughts on this!

}();

const auto aerr = std::abs(x - x_ref);
const auto rerr = aerr / std::abs(x_ref);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have additional checks (at least on std::abs(x_ref) being above a numerical threshold for instance) before we set on this definition of rerr for "relative error"? Also, is it safe to define such a relative error to be satisfied component-wise when checking the results of a DFT?
For instance, what if

  • x_ref is exactly 0.0 (maybe unlikely but not totally impossible);
  • we're checking "equality" between entries of a vector whose largest entry is of order 1, say, and those particular entries are (both) equivalent to 0.0 in comparison, numerically? In that case, x and x_ref may be both anywhere between -threshold and +threshold (where there exists some 0 < threshold << 1) without invalidating the overall result. However, the current rerr for such a particular entry could be much larger than the bound (it could be of order 1 if not much much larger if std::abs(x_ref) << std::abs(x) while both being under the threshold value).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the comments on small values of x_ref I think I have answered my opinion on this above. I don't see the issue of having a very large rerr, even an infinite one, since the absolute error should be used in this case. I think having a threshold for when to use absolute error or relative error is pretty arbitrary too so this is the most elegant solution to me.

I'm not sure I follow how the relative error would be unsafe for component-wise checking. Would you suggest to remove the relative error altogether? I'm worried that large values could easily fail in that case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for splitting the conversation in separate comments, let's keep it going here, I'll mark the other comment as "resolved".

Regarding the comment above on small (relative) values: for some reason I had read that the check was using a logical AND instead of a logical OR when reviewing, my bad... Thank you for the comment and for pointing out the logic in such cases.

In light of that, I believe that the current check is alright in the current usage and for the sizes considered herein, however I think it could become problematic if testing non-unit scaling factor and/or much larger sizes. I am certainly not suggesting to get rid of the relative error (please keep it!) but I believe the absolute error probably needs to be refined and/or made more robust with respect to the reference solution. I believe it currently assumes that some norm of the reference solution is order 1 but we probably want the bound on aerr to be proportional to that norm, in general.

For instance, the current check on absolute error would be way too forgiving when testing a DFT with a very small scaling factor and/or with an input signal of small magnitude, instead. If we consider input data as a constant vector of values alpha/N where N is the length of the transform, then the reference output is an impulse signal of amplitude alpha. Now, take alpha much less than the current bound, then any backend FFT tool producing an output signal roughly bounded by alpha will be approved based on the check on absolute error. That includes incorrect solutions like an output of constant values alpha, an output signal as a perfect cosine wave of amplitude alpha, ...
The problem in this example is that the incorrect entries of the output would be approved based on the absolute error check although the relative error does make sense in such cases and should be taken into account, in fact: while the output entries are indeed "small compared to 1", they are not "small compared to some norm of the reference solution" (floating point arithmetic errors do not explain the differences). Conversely, the current absolute error check becomes rather meaningless for very large alpha and this may report false negatives in such cases, as is.

I believe this verification check would be more robust by making the bound on aerr proportional to "some norm" of the reference solution (my personal preference would be the max norm). Currently, the input data is random between -1 and +1 unless I'm mistaken, so alpha ~ N which could become a problem for very large N.

I leave it up to you whether you want to scale in the input data by 1/N (to make the reference output roughly of order 1) or if we want to make the bound on aerr proportional to some norm of the reference solution, instead. My preference would be the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced a new absolute error margin to follow your suggestion in 925efb7. I assumed it should still depend on the problem size as well?
Here is a log running with the MKLGPU backend to show some of the values, error margins and bounds:
verbose_log.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for fragmenting the error bound discussion so much. The error bound taken from MKL examples that's a multiple of log2(N) is an absolute error bound in the context of input in [-1/N, 1/N]. I have no objections to Raphael's suggestion of absolute error bound here, but I think the relative error bound should be some small fraction right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error margins are multiplied by a small epsilon here so the 2 bounds ends up in the range 1e-6 to 1e-15 in the current tests. Are you suggesting we should tighten more the relative error margin? There seems to be some room for a smaller margin by a factor of 10 maximum currently.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes. Can you confirm that

  1. the log you attached is only for illustration purposes, i.e., temporarily disabling the if (!ok) here below, right? (but no actual false negative caught and reported otherwise)
  2. the 1e-6 and 1e-15 range you're referring to in your last comment are the for the bounds in relative errors for single and double precision cases, respectively?
    If so, I am satisfied with the suggested changes, personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes, exactly.

for (std::size_t k = 0; k < N; ++k) {
std::complex<ref_t> out_temp = 0;
const auto partial_expo = (static_cast<ref_t>(k) * TWOPI) / static_cast<ref_t>(N);
for (std::size_t n = 0; n < N; ++n) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this upper bound is correct for real transforms checks if the over-allocation mentioned above is resolved. It should probably be n < in.size() (for standard layouts with out-of-place) or n < in.size() - 2 (for standard layouts with in-place)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design here is for this reference computation to not distinguish between real and complex input and always compute the full length output. There's a cast of the input to complex type below and some code in the compute_*.hpp files to construct a reference output of length conjugate_even_size from this initial reference output in the case of real.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: my concern was not with computing the full length vs. exploiting CCE in the expected output, nor with the intrinsic consistency of the construction of the reference output herein; but with the particular usage presented here which I find confusing and/or prone to issues in the future (especially if this related comment is addressed).

Basically, there is an assumption mapping the size of out (not in!) to the size of the transform for which a reference output is being constructed, and that assumption is not the safest, in my opinion.

When reading this function independently of any other, I was expecting out to be a vector of complex floating point values whose size would be (in.size()/2 + 1) (or in.size()/2 if testing in-place), when constructing the reference solution for testing real transforms. I believe this is not the case for now only because of this related over-allocation comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me that over-allocation just looks like a design decision to allocate a full in.size() complex numbers for the reference output and compute the full output in the ref implementation, presumably so the reference doesn't have to have any conditionals on input type. I see your point about it being potentially confusing, but I don't see anything unsafe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing "unsafe" in the current implementation (wrong wording from myself), but my opinion was that this function is susceptible to produce an incorrect reference solution in case of slightly modified usage although such usage would seem acceptable from the signature alone. The complex roots of unity are wrong if out.size() is not equal to the length of the transform being tested.

This usage assumption is not broken anywhere in the current usage/PR but it's not guarded anywhere, either. I don't think it would be detrimental in any way to have the length of the DFT under test an extra argument to this function, say dft_length (unless we no longer used N = out.size()), and have
const auto partial_expo = (static_cast<ref_t>(k) * TWOPI) / static_cast<ref_t>(dft_length);
and
for (std::size_t n = 0; n < dft_length; ++n)
in the body of this reference implementation, with another appropriate guard on in.size() >= dft_length herein. That would make it valid in the current usage but also in possible other use cases as I was foreseeing when I started this conversation, unless I'm mistaken.

This being said, the descriptive comment added in 774e322901b78a5bee2f1021f6e120559728b49d alleviates some of my concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I certainly am not opposed to any of those suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the concern is that there is scope to compute an incorrect result when in.size() != out.size(). I've added an exception for this in 3cd6090. This function is only used in the DFT_test init, so hopefully there is limited scope for misuse.

I think this function gains more significant changes when multi-dimensional DFT tests are implemented in a later PR.

out[k] = static_cast<TypeOut>(out_temp);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm that this reference forward implementation is for

  • single-batch transforms;
  • 1D only;
  • unit strides with no offset;
  • no scaling factor;
  • ... more assumptions?

If correct, could we explicitly point that out as a comment herein, maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - in the form of this current PR the above is correct.

I've added doxygen stating as much in 774e322.


const std::int64_t size;
const std::int64_t conjugate_even_size;
static constexpr int error_margin = 10;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want that margin to be proportional to the logarithm of the size of the transform?

if constexpr (sizeof(double) == sizeof(long double) && std::is_same_v<fp_real, double>) {
// The reference DFT uses long double to maintain accuracy
// when this isn't possible, lower the accuracy requirements
return 1e-12;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this absolute/relative error threshold for double-precision to be satisfied for real and complex fp type? Also, why such a distinction between single and double precision?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the challenges in the reference DFT was accuracy. Originally, we computed the reference DFT results in double precision. This meant that we could get an accurate result for single precision, and but there was numerical error introduced in the reference double-precision result. This error in the double-precision reference required us to set a larger error margin than we would have liked. The solution to this is to use long double for computing the reference.

However, there is a problem in our solution: long double is equivalent to double on some compilers. Consequently, we may not always be able to use the extended precision to compute more accurate reference results for double precision. The solution to this is to allow increased error margins when using long double is double and we are checking the accuracy of double-precision results.

Yes, this is an error threshold for both real and complex types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of accuracy of the reference impl, I think we would probably still have issues with a naive implementation like this if the FFT size is large enough. Whether that matters depends on the purpose of providing these unit tests. I've assumed the purpose is to add a few checks for small sizes mostly to guarantee the interface layer between these oneMKL APIs and the backend is functioning, and that any backend should have its own much larger suite of tests that would cover accuracy at larger sizes. Is that assumption correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed this issue too. As you said we think it's more important to test that the interface is working and tolerate a large margin of error. That will also simplify adding more backends and supporting more devices.

Comment on lines 39 to 41
sycl::buffer<InputType, 1> in_dev{ sycl::range<1>(size) };
sycl::buffer<OutputType, 1> out_dev{ sycl::range<1>(size) };
sycl::buffer<InputType, 1> out_back_dev{ sycl::range<1>(size) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested renaming: fwd_data_dev, bwd_data_dev, roundtrip_data_dev for these three. (Same for USM tests here below, if you agree to the suggestion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b18bbe6.

input_re = std::vector<PrecisionType>(size);
input_im = std::vector<PrecisionType>(size);

out_host_ref = std::vector<OutputType>(size);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputType is always complex for fwd transforms so this is over-allocated for real-transforms, I believe: is that intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point being that there is redundant data in the output of the real forward DFT, that we are storing unnecessarily. I think the aim here is to KISS. Whilst that does entail additional computational cost and storage, I don't think this a major concern for the tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concern in the current context indeed and I'm not opposed to keeping it as is, but it might be a desirable change when/if extending testing scope to larger sets of tests and/or much larger test sizes. Maybe we can add a note/comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added in a67cc38.

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have two questions for changes in the common helper.
Also, do you plan to remove unit_test placeholder from #259 ?

Comment on lines 38 to 39
std::vector<FwdInputType> inout_host(input);
sycl::buffer<FwdInputType, 1> inout_buf{ inout_host.data(), sycl::range<1>(container_size) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the size of inout_host less than container_size for real transforms? (size vs 2 * (size / 2 + 1))
This triggers reads past bounds when initializing the buffer data and writes past bound into inout_host upon destruction of inout_buf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting that. I've corrected inout_host to be equal to container_size in 6a6d3a7.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, doesn't that trigger the throwing of the newly added invalid_argument exception in reference_forward_dft, then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't because the reference_forward_dft uses input and out_host_ref as parameters, instead of inout_host. The contents of input are copied into inout_host.

@hjabird
Copy link
Contributor

hjabird commented Feb 1, 2023

Also, do you plan to remove unit_test placeholder from #259 ?

We do. I'll aim to do that when we rebase #259 onto the tests once merged. I've added a comment to this effect here.

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments. Overall DFT tests integration looks good to me.

Copy link
Contributor

@hparks-intel hparks-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion. Everything looks good to me now.

@FMarno
Copy link
Contributor

FMarno commented Feb 2, 2023

Due to this comment on #259, I enabled real-complex out-of-place transforms and found that they should be using the conjugate even form, so I had to update the tests to account for that.

descriptor.get_value(oneapi::mkl::dft::config_param::LENGTHS, lengths_value.data());
descriptor.get_value(oneapi::mkl::dft::config_param::DIMENSION, &dimensions_after_set);

EXPECT_EQ(new_lengths, lengths_value);
Copy link
Contributor

@anantsrivastava30 anantsrivastava30 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are testing if the opensource descriptor can set/get the Length but are we ensuring that is being communicated to the intel oneMKL descriptor
for instance setting LENGTHS is not allowed on GPU/CPU, and would resutl in a ERR_BAD_OPERATION
so either,

  1. we remove the set length, I believe we do take care for other readonly config_params (this one is not explicitly mentioned in the spec),
  2. in the initial implimentation get_value ensure that the value stored in the config_value is the same as the one returned by the intel oneMKL descriptor.

Copy link
Contributor

@Rbiessy Rbiessy Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only testing the open source descriptor in this test. Whether the values are properly communicated to the backend can be tested when running the computational tests.
The backend descriptor is only created once the open-source descriptor is committed and the user cannot set values if the descriptor is already committed (see here) so there are no risks to have the LENGTHS read-write as the spec defines.

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. So you are saying that the compute tests passsing is indicative of a proper communitation to the backend desc?
  2. And so what would happen if say I need to reset the strides of a real problem, and recommit before launching backward transform after running a forward transform (e.g); if we can't set value on a commited descriptor then how do we impliment a real outofplace roundtrip with no padded data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, I can't imagine a scenario where all tests pass without the backend getting the right configuration.
  2. The case you are describing could work with 2 descriptors. This was our understanding from the spec: All calls to the set_value function to change configuration parameters of a descriptor need to happen after the constructor call for the descriptor class and before a call to commit. Maybe this is just confusing wording? Do we want to allow the user to set_value again as long it is committed again? Even if that is the case, the LENGTHS can still be writeable as we create a new backend descriptor when commit is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thank you, yeah that sounds fair,
  2. Good point the spec does say that but as you can see in the examples we can keep the same descriptor and reuse it, for multiple cases. We should not recreate the discriptor unless its out of scope and the unique_ptr naturally destroy's it self. More on Lenghts (specific to the C Api)
    DFTI_DIMENSION and DFTI_LENGTHS do not have a default value. To set them, use the DftiCreateDescriptor function and not the DftiSetValue function. (ref)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would prefer if we can update the open-source spec first if you need to make it closer to the closed-source MKL spec. I'm not seeing any real reason why we should change the current spec though. The descriptor likely needs to prepare again from scratch for any new commit, doesn't it? Also we're wondering if there could be complications if the user commits a second time with a different queue.

descriptor.get_value(oneapi::mkl::dft::config_param::BWD_DISTANCE,
&bwd_distance_before_set);
EXPECT_EQ(1, bwd_distance_before_set);
descriptor.set_value(oneapi::mkl::dft::config_param::BWD_DISTANCE, bwd_distance_set_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this to be the same as the forward distance with COMPLEX_CONJUGATE (for real inplace problems), would result in a inconsistent configuration error as the bwd_distance, should be half the fwd_distance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are testing real inplace transforms in a next MR. This sets the same value for forward and backward distance and passes. The spec only says that the 2 distances should be consistent. Are you talking about another case that we are not testing yet?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fairly ambiguous "consistency" is a (hidden) reference to "For in-place transforms, ensure the first element of the input data has the same location as the first element of the output data for each dimension." from this page of our docs. For in-place complex transforms, this means equal forward and backward distances, indeed. However, for real transform with DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_COMPLEX, the (implicit) units used for strides and distances are real floating point values in forward domain and complex floating point values in backward domains (more details on the implicit "units" in either domain here). So, the backward distance must be half the forward distance in that case (a padding of two real floating point values is required in the forward domain).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the code @Rbiessy linked works then, even for cases with NUMBER_OF_TRANSFORMS > 1? Is BWD_DISTANCE simply ignored?

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the test lines linked by @Rbiessy works becasue the container size is setup correctly, I am talking about this tests specifically /DescriptorTestSuite/DescriptorTests.DescriptorTestsReal{Single|Double}/ we are setting up both the input and output distance as 12 (with a bunch of other parameters) and the eventual recipe is commited (line 344), that commit will fail, because of what @raphael-egan explanations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the open-source spec CONJUGATE_EVEN_STORAGE only supports COMPLEX_COMPLEX so this is implicitly set here. This makes sense since there is a note saying DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_REAL is deprecated (and will be removed for multi-dimensional FFTs).
We have noticed a couple of strange behaviors with the MKLGPU backend:

  • I was not able to query the backend's fwd and bwd distance. It throws an exception with FFT_INVALID_DESCRIPTOR for all of our tests. Is that expected?
  • It seems the backend ignores the value set for BWD_DISTANCE as our multi-batches test pass for any value we set. I can half the BWD_DISTANCE for real transforms but it seems there is backend's bug?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • FFT_INVALID_DESCRIPTOR being returned when querying fwd/bwd distance used to be a bug on our side: MKLD-13592. It was fixed on 11/03/2022 for oneMKL2023.0 release I believe, in PR#2163;
  • regarding the BWD_DISTANCE value apparently ignored, I believe you are right although that depends on the problem configuration and the targeted GPU device (could you provide more details?). Unless you're testing an r2c of size 2*N <= 4096 on PVC, the default code path is taken and the bwd distance is indeed ignored for in-place operation in that case. It is instead coerced into the value that's required for consistency. I agree that it should at least check that the user did that as well and error out if not: this is part of something I have intended to address for quite a while now (MKLD-13390) but not in our top priorities so far unfortunately

Copy link
Contributor

@Rbiessy Rbiessy Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying.

  • I am using the 2023.0 release, maybe it will be fixed for the next release?
  • We are testing some rather old integrated GPU (HD Graphics 530 in my case) and the configuration is real, in-place, single precision, size=2x2x2, batch=2, backward scale=1/8, fwd_distance = bwd_distance = 16. How can the backend force a bwd_distance? Isn't the user allowed to set a larger bwd_distance in such a case? This is what we are currently testing (by mistake) but the output doesn't change with any of the bwd_distance I tried (0, 8, 16, 60).

I have removed the commit of the descriptor for this particular to make it clearer that this is only testing the open-source descriptor (1961d9c) here.
I have halved the distance for this descriptor even if that shouldn't matter anymore (2aba40d).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • My bad, I believe PR#2163 missed the code freeze by one day so the FFT_INVALID_DESCRIPTOR is to be expected with the 2023.0 release.
  • The bwd distance is enforced to half the fwd distance for real in-place operations at the moment (except for the more recent paths implemented recently which are taken only on PVC).

This being said, I'm still surprised you're not getting incorrect results for srfi2x2x22 with a fwd distance of 8... Shouldn't it be a forward distance of 16 in that case (backward distance of 8)? 22*(2*(2/2 + 1)) = 16, (a 3D in-place real-transform of size AxBxC requires some padding in forward domain, with a forward distance of AB(2*(C/2 + 1)) to accommodate the bwd domain's data in the same allocated chunk)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I edited my previous message, the distances are set to 16 indeed. I'd like to discuss in our next meeting if we can clarify the situation for the users regarding bwd_distance. That conversation got a bit off topic!

Copy link
Contributor

@anantsrivastava30 anantsrivastava30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good

@Rbiessy
Copy link
Contributor

Rbiessy commented Feb 15, 2023

@mkrainiuk @akabalov we are not able to hit the merge button. Are we waiting for something else before this can be merged?

@mkrainiuk
Copy link
Contributor

@mkrainiuk @akabalov we are not able to hit the merge button. Are we waiting for something else before this can be merged?

Let me know if I can merge it now.

@Rbiessy
Copy link
Contributor

Rbiessy commented Feb 15, 2023

Let me know if I can merge it now.

Yes please do, thank you!

@mkrainiuk mkrainiuk merged commit 245fd11 into uxlfoundation:develop Feb 15, 2023
anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Feb 28, 2023
@Rbiessy Rbiessy deleted the dft-tests branch March 9, 2023 11:18
anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Mar 13, 2023
anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Apr 20, 2023
anantsrivastava30 added a commit that referenced this pull request Apr 28, 2023
* FFT_MKLCPU_staging + sync #261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* command group handles only the DftiCommitDescriptor

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* clang-format

* fix ct adding both backends to the linkline

* FFT_MKLCPU_staging + sync #261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* minor changes and error checks

* command group handles only the DftiCommitDescriptor

* move queue to pass by reference

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* fix ct adding both backends to the linkline

* address reviews

* move the derived class to its own file

* address warnings, and change buffer type

* Update src/dft/backends/mklcpu/commit.cpp

* resovle dynamic dispatch issue

* merge cmake for both device

* remove bad file

* removed old comment

* clang-format
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
* FFT_MKLCPU_staging + sync uxlfoundation#261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* command group handles only the DftiCommitDescriptor

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* clang-format

* fix ct adding both backends to the linkline

* FFT_MKLCPU_staging + sync uxlfoundation#261

* fix real_real failures

* tests for CPU backend

* [DFT] fix rebase issues

* [DFT] address reviews

* minor changes and error checks

* command group handles only the DftiCommitDescriptor

* move queue to pass by reference

* set the distance to true default

* fix testing issue with complex_complex

* change real_real test to work for md, batch transforms

* adderssing minor comments

* add error messages for ordering, transpose, workspace

* change error message from cpu to gpu

* move the desc_buffer to the commit class to avoid descructor invoke

* refactor forward declared derived class

* extend buffer lifetime for backward

* add error messaging for compute

* patchfix for transform in/out distances to fwd/bwd

* revert resetting dist in roundtrip

* revert tests to match specs

* fix ct adding both backends to the linkline

* address reviews

* move the derived class to its own file

* address warnings, and change buffer type

* Update src/dft/backends/mklcpu/commit.cpp

* resovle dynamic dispatch issue

* merge cmake for both device

* remove bad file

* removed old comment

* clang-format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants