Add unit-tests for the dft interface by fabiomestre · Pull Request #261 · uxlfoundation/oneMath

fabiomestre · 2022-12-30T11:34:34Z

Description

Adds tests for the dft interface implemented on: #259.
Note: That PR should be merged before this one.

Current tests cover the following:

descriptor getters / setters
descriptor constructor
descriptor commit
compute_forward overloads using different sizes (in 1D), precisions and domains.
compute_backward overloads using different sizes (in 1D), precisions and domains.

Additional tests will be added in a future PR to extend the test coverage.

Checklist

All Submissions

[N/A] Do all unit tests pass locally? This PR implements unit tests for another PR
Have you formatted the code using clang-format?

FMarno

I think most (if not all) of the places you've used std::cout std::cerr would be more appropriate. Other than that there is some repeated code but overall looks good.

tests/unit_tests/dft/include/compute_tester.hpp

tests/unit_tests/dft/include/reference_dft.hpp

tests/unit_tests/dft/include/test_common.hpp

tests/unit_tests/dft/include/reference_dft.hpp

tests/unit_tests/dft/include/test_common.hpp

Rbiessy

Can you let us know if we need any more changes to this PR?

hparks-intel · 2023-01-30T14:41:53Z

tests/unit_tests/dft/include/compute_tester.hpp

+    using InputType = typename std::conditional_t<domain == oneapi::mkl::dft::domain::REAL,
+                                                  PrecisionType, std::complex<PrecisionType>>;
+    using OutputType = std::complex<PrecisionType>;


Is there a reason this is so specific to forward compute?

Yes. We test the forward transform followed by the backward transform. Since we keep the input to the forward transform, there is little reason to compute the inverse for reference purposes.

Thanks. What do you think of renaming them fwdInputType and fwdOutputType?

Good idea! I've followed your suggestion in 2fbfcb6.

hparks-intel · 2023-01-30T14:43:19Z

tests/unit_tests/dft/include/compute_tester.hpp

+                                                  PrecisionType, std::complex<PrecisionType>>;
+    using OutputType = std::complex<PrecisionType>;
+
+    enum class TestType { buffer, usm };


I think we could have a more specific name for this than TestType.

Changed to MemoryAccessModel in 2fbfcb6.

hparks-intel · 2023-01-30T14:55:25Z

tests/unit_tests/dft/include/compute_tester.hpp

+
+    const std::int64_t size;
+    const std::int64_t conjugate_even_size;
+    static constexpr int error_margin = 10;


Is this just a heuristic?

Don't we want that margin to be proportional to the logarithm of the size of the transform?

Yeah, I think it would be better to have a heuristic based on the problem size, like in the MKL examples. And I think we should have a comment that it's a heuristic, average-case error bound.

I've set the error margin based on the problem size in 75c0fc5

hparks-intel · 2023-01-30T14:56:47Z

tests/unit_tests/dft/include/test_common.hpp

+            return 1e-12;
+        }
+        else {
+            return (error_mag * num_components<fp>() * std::numeric_limits<fp_real>::epsilon());


I don't really understand why we define the error_margin heuristic at a much higher level instead of just defining/using error_mag = 10 here.

Overall we've followed the design from the blas domain here: https://github.com/oneapi-src/oneMKL/blob/develop/tests/unit_tests/blas/include/test_common.hpp#L441
It re-uses some of the same code but we figured this duplication helps keeping the domain separate.
We'll do the change to set the error_margin based on the problem size so it makes sense to set it at a higher level I think.

hparks-intel · 2023-01-30T16:03:23Z

tests/unit_tests/dft/include/compute_inplace.hpp

+    sycl::buffer<InputType, 1> inout_dev{ sycl::range<1>(container_size) };
+    std::vector<InputType> out_host(container_size);
+
+    copy_to_device(sycl_queue, input, inout_dev);


Can you explain a little bit why you chose to use these explicit copies?

This was a result of being influenced too much by HIP and CUDA. I've changed the buffer tests to use SYCL's implicit buffer data movement in d39fe50.

hparks-intel · 2023-01-30T16:07:19Z

tests/unit_tests/dft/include/compute_inplace.hpp

+        EXPECT_TRUE(check_equal_vector(out_host.data(), out_host_ref_conjugate.data(),
+                                       out_host.size(), error_margin, std::cout));


The check doesn't really make use of out_host being an std::vector. Why not use inout_dev.get_access<read>().get_pointer() instead of copying to out_host and then using out_host.data()?

This now uses a host accessor as suggested in d39fe50.

hparks-intel · 2023-01-30T16:12:47Z

tests/unit_tests/dft/include/reference_dft.hpp

+    using ref_t = long double; /* Do the calculations using long double */
+    static_assert(is_complex<TypeOut>(), "Output type of DFT must be complex");
+
+    const ref_t TWOPI = ref_t{ 2.0 } * std::atan(ref_t{ 1.0 }) * ref_t{ 4.0 };


Was there a reason you went with this calculation over a constant?

This is now a literal in 2fbfcb6.

raphael-egan · 2023-01-30T18:12:46Z

tests/unit_tests/dft/include/test_common.hpp

+
+    const auto aerr = std::abs(x - x_ref);
+    const auto rerr = aerr / std::abs(x_ref);
+    const bool ok = (rerr <= bound) || (aerr <= bound);


Using the same error bound on relative and absolute errors here: is that intentional? What if std::abs(x_ref) ~ 10^16?

Yes this is intentional and following the decision from the blas domain: https://github.com/oneapi-src/oneMKL/blob/develop/tests/unit_tests/blas/include/test_common.hpp#L449
This is the issue with relative errors, they're not reliable for small inputs. In that case we rely on the absolute error to give a meaningful answer (i.e. aerr <= bound).
Using both absolute and relative errors like this is a pretty forgiving way of testing and this relates to the discussion in this comment #261 (comment). We also discussed this issue and concluded that it is not necessary for oneMKL to require a very high accuracy for all backends. Feel free to let us know your thoughts on this!

raphael-egan · 2023-01-30T18:26:58Z

tests/unit_tests/dft/include/test_common.hpp

+    }();
+
+    const auto aerr = std::abs(x - x_ref);
+    const auto rerr = aerr / std::abs(x_ref);


Shouldn't we have additional checks (at least on std::abs(x_ref) being above a numerical threshold for instance) before we set on this definition of rerr for "relative error"? Also, is it safe to define such a relative error to be satisfied component-wise when checking the results of a DFT?
For instance, what if

x_ref is exactly 0.0 (maybe unlikely but not totally impossible);

we're checking "equality" between entries of a vector whose largest entry is of order 1, say, and those particular entries are (both) equivalent to 0.0 in comparison, numerically? In that case, x and x_ref may be both anywhere between -threshold and +threshold (where there exists some 0 < threshold << 1) without invalidating the overall result. However, the current rerr for such a particular entry could be much larger than the bound (it could be of order 1 if not much much larger if std::abs(x_ref) << std::abs(x) while both being under the threshold value).

Regarding the comments on small values of x_ref I think I have answered my opinion on this above. I don't see the issue of having a very large rerr, even an infinite one, since the absolute error should be used in this case. I think having a threshold for when to use absolute error or relative error is pretty arbitrary too so this is the most elegant solution to me.

I'm not sure I follow how the relative error would be unsafe for component-wise checking. Would you suggest to remove the relative error altogether? I'm worried that large values could easily fail in that case.

My apologies for splitting the conversation in separate comments, let's keep it going here, I'll mark the other comment as "resolved".

Regarding the comment above on small (relative) values: for some reason I had read that the check was using a logical AND instead of a logical OR when reviewing, my bad... Thank you for the comment and for pointing out the logic in such cases.

In light of that, I believe that the current check is alright in the current usage and for the sizes considered herein, however I think it could become problematic if testing non-unit scaling factor and/or much larger sizes. I am certainly not suggesting to get rid of the relative error (please keep it!) but I believe the absolute error probably needs to be refined and/or made more robust with respect to the reference solution. I believe it currently assumes that some norm of the reference solution is order 1 but we probably want the bound on aerr to be proportional to that norm, in general.

For instance, the current check on absolute error would be way too forgiving when testing a DFT with a very small scaling factor and/or with an input signal of small magnitude, instead. If we consider input data as a constant vector of values alpha/N where N is the length of the transform, then the reference output is an impulse signal of amplitude alpha. Now, take alpha much less than the current bound, then any backend FFT tool producing an output signal roughly bounded by alpha will be approved based on the check on absolute error. That includes incorrect solutions like an output of constant values alpha, an output signal as a perfect cosine wave of amplitude alpha, ...
The problem in this example is that the incorrect entries of the output would be approved based on the absolute error check although the relative error does make sense in such cases and should be taken into account, in fact: while the output entries are indeed "small compared to 1", they are not "small compared to some norm of the reference solution" (floating point arithmetic errors do not explain the differences). Conversely, the current absolute error check becomes rather meaningless for very large alpha and this may report false negatives in such cases, as is.

I believe this verification check would be more robust by making the bound on aerr proportional to "some norm" of the reference solution (my personal preference would be the max norm). Currently, the input data is random between -1 and +1 unless I'm mistaken, so alpha ~ N which could become a problem for very large N.

I leave it up to you whether you want to scale in the input data by 1/N (to make the reference output roughly of order 1) or if we want to make the bound on aerr proportional to some norm of the reference solution, instead. My preference would be the latter.

I have introduced a new absolute error margin to follow your suggestion in 925efb7. I assumed it should still depend on the problem size as well?
Here is a log running with the MKLGPU backend to show some of the values, error margins and bounds:
verbose_log.txt

Sorry for fragmenting the error bound discussion so much. The error bound taken from MKL examples that's a multiple of log2(N) is an absolute error bound in the context of input in [-1/N, 1/N]. I have no objections to Raphael's suggestion of absolute error bound here, but I think the relative error bound should be some small fraction right?

The error margins are multiplied by a small epsilon here so the 2 bounds ends up in the range 1e-6 to 1e-15 in the current tests. Are you suggesting we should tighten more the relative error margin? There seems to be some room for a smaller margin by a factor of 10 maximum currently.

Thank you for the changes. Can you confirm that

the log you attached is only for illustration purposes, i.e., temporarily disabling the if (!ok) here below, right? (but no actual false negative caught and reported otherwise)

the 1e-6 and 1e-15 range you're referring to in your last comment are the for the bounds in relative errors for single and double precision cases, respectively?
If so, I am satisfied with the suggested changes, personally.

Yes and yes, exactly.

raphael-egan · 2023-01-30T18:46:15Z

tests/unit_tests/dft/include/reference_dft.hpp

+    for (std::size_t k = 0; k < N; ++k) {
+        std::complex<ref_t> out_temp = 0;
+        const auto partial_expo = (static_cast<ref_t>(k) * TWOPI) / static_cast<ref_t>(N);
+        for (std::size_t n = 0; n < N; ++n) {


I don't think this upper bound is correct for real transforms checks if the over-allocation mentioned above is resolved. It should probably be n < in.size() (for standard layouts with out-of-place) or n < in.size() - 2 (for standard layouts with in-place)...

I think the design here is for this reference computation to not distinguish between real and complex input and always compute the full length output. There's a cast of the input to complex type below and some code in the compute_*.hpp files to construct a reference output of length conjugate_even_size from this initial reference output in the case of real.

Clarification: my concern was not with computing the full length vs. exploiting CCE in the expected output, nor with the intrinsic consistency of the construction of the reference output herein; but with the particular usage presented here which I find confusing and/or prone to issues in the future (especially if this related comment is addressed).

Basically, there is an assumption mapping the size of out (not in!) to the size of the transform for which a reference output is being constructed, and that assumption is not the safest, in my opinion.

When reading this function independently of any other, I was expecting out to be a vector of complex floating point values whose size would be (in.size()/2 + 1) (or in.size()/2 if testing in-place), when constructing the reference solution for testing real transforms. I believe this is not the case for now only because of this related over-allocation comment.

To me that over-allocation just looks like a design decision to allocate a full in.size() complex numbers for the reference output and compute the full output in the ref implementation, presumably so the reference doesn't have to have any conditionals on input type. I see your point about it being potentially confusing, but I don't see anything unsafe.

There is nothing "unsafe" in the current implementation (wrong wording from myself), but my opinion was that this function is susceptible to produce an incorrect reference solution in case of slightly modified usage although such usage would seem acceptable from the signature alone. The complex roots of unity are wrong if out.size() is not equal to the length of the transform being tested.

This usage assumption is not broken anywhere in the current usage/PR but it's not guarded anywhere, either. I don't think it would be detrimental in any way to have the length of the DFT under test an extra argument to this function, say dft_length (unless we no longer used N = out.size()), and have
const auto partial_expo = (static_cast<ref_t>(k) * TWOPI) / static_cast<ref_t>(dft_length);
and
for (std::size_t n = 0; n < dft_length; ++n)
in the body of this reference implementation, with another appropriate guard on in.size() >= dft_length herein. That would make it valid in the current usage but also in possible other use cases as I was foreseeing when I started this conversation, unless I'm mistaken.

This being said, the descriptive comment added in 774e322901b78a5bee2f1021f6e120559728b49d alleviates some of my concern.

Thanks. I certainly am not opposed to any of those suggestions.

As I understand it, the concern is that there is scope to compute an incorrect result when in.size() != out.size(). I've added an exception for this in 3cd6090. This function is only used in the DFT_test init, so hopefully there is limited scope for misuse.

I think this function gains more significant changes when multi-dimensional DFT tests are implemented in a later PR.

raphael-egan · 2023-01-30T18:50:35Z

tests/unit_tests/dft/include/reference_dft.hpp

+        out[k] = static_cast<TypeOut>(out_temp);
+    }
+}
+


Could you confirm that this reference forward implementation is for

single-batch transforms;

1D only;

unit strides with no offset;

no scaling factor;

... more assumptions?

If correct, could we explicitly point that out as a comment herein, maybe?

Yes - in the form of this current PR the above is correct.

I've added doxygen stating as much in 774e322.

raphael-egan · 2023-01-30T18:57:40Z

tests/unit_tests/dft/include/compute_tester.hpp

+
+    const std::int64_t size;
+    const std::int64_t conjugate_even_size;
+    static constexpr int error_margin = 10;


Don't we want that margin to be proportional to the logarithm of the size of the transform?

raphael-egan · 2023-01-30T19:00:35Z

tests/unit_tests/dft/include/test_common.hpp

+        if constexpr (sizeof(double) == sizeof(long double) && std::is_same_v<fp_real, double>) {
+            // The reference DFT uses long double to maintain accuracy
+            // when this isn't possible, lower the accuracy requirements
+            return 1e-12;


Is this absolute/relative error threshold for double-precision to be satisfied for real and complex fp type? Also, why such a distinction between single and double precision?

One of the challenges in the reference DFT was accuracy. Originally, we computed the reference DFT results in double precision. This meant that we could get an accurate result for single precision, and but there was numerical error introduced in the reference double-precision result. This error in the double-precision reference required us to set a larger error margin than we would have liked. The solution to this is to use long double for computing the reference.

However, there is a problem in our solution: long double is equivalent to double on some compilers. Consequently, we may not always be able to use the extended precision to compute more accurate reference results for double precision. The solution to this is to allow increased error margins when using long double is double and we are checking the accuracy of double-precision results.

Yes, this is an error threshold for both real and complex types.

Speaking of accuracy of the reference impl, I think we would probably still have issues with a naive implementation like this if the FFT size is large enough. Whether that matters depends on the purpose of providing these unit tests. I've assumed the purpose is to add a few checks for small sizes mostly to guarantee the interface layer between these oneMKL APIs and the backend is functioning, and that any backend should have its own much larger suite of tests that would cover accuracy at larger sizes. Is that assumption correct?

Yes, we discussed this issue too. As you said we think it's more important to test that the interface is working and tolerate a large margin of error. That will also simplify adding more backends and supporting more devices.

raphael-egan · 2023-01-30T19:05:27Z

tests/unit_tests/dft/include/compute_out_of_place.hpp

+    sycl::buffer<InputType, 1> in_dev{ sycl::range<1>(size) };
+    sycl::buffer<OutputType, 1> out_dev{ sycl::range<1>(size) };
+    sycl::buffer<InputType, 1> out_back_dev{ sycl::range<1>(size) };


Suggested renaming: fwd_data_dev, bwd_data_dev, roundtrip_data_dev for these three. (Same for USM tests here below, if you agree to the suggestion)

Done in b18bbe6.

raphael-egan · 2023-01-30T19:12:34Z

tests/unit_tests/dft/include/compute_tester.hpp

+        input_re = std::vector<PrecisionType>(size);
+        input_im = std::vector<PrecisionType>(size);
+
+        out_host_ref = std::vector<OutputType>(size);


OutputType is always complex for fwd transforms so this is over-allocated for real-transforms, I believe: is that intentional?

I understand your point being that there is redundant data in the output of the real forward DFT, that we are storing unnecessarily. I think the aim here is to KISS. Whilst that does entail additional computational cost and storage, I don't think this a major concern for the tests.

No major concern in the current context indeed and I'm not opposed to keeping it as is, but it might be a desirable change when/if extending testing scope to larger sets of tests and/or much larger test sizes. Maybe we can add a note/comment?

Comment added in a67cc38.

mkrainiuk

Thanks for the PR! I have two questions for changes in the common helper.
Also, do you plan to remove unit_test placeholder from #259 ?

tests/unit_tests/include/test_helper.hpp

raphael-egan · 2023-01-31T19:29:07Z

tests/unit_tests/dft/include/compute_inplace.hpp

+    std::vector<FwdInputType> inout_host(input);
+    sycl::buffer<FwdInputType, 1> inout_buf{ inout_host.data(), sycl::range<1>(container_size) };


Isn't the size of inout_host less than container_size for real transforms? (size vs 2 * (size / 2 + 1))
This triggers reads past bounds when initializing the buffer data and writes past bound into inout_host upon destruction of inout_buf

Thank you for spotting that. I've corrected inout_host to be equal to container_size in 6a6d3a7.

Well, doesn't that trigger the throwing of the newly added invalid_argument exception in reference_forward_dft, then?

It doesn't because the reference_forward_dft uses input and out_host_ref as parameters, instead of inout_host. The contents of input are copied into inout_host.

hjabird · 2023-02-01T12:53:55Z

Also, do you plan to remove unit_test placeholder from #259 ?

We do. I'll aim to do that when we rebase #259 onto the tests once merged. I've added a comment to this effect here.

mkrainiuk

Thanks for addressing my comments. Overall DFT tests integration looks good to me.

hparks-intel

Thanks for the discussion. Everything looks good to me now.

FMarno · 2023-02-02T18:11:58Z

Due to this comment on #259, I enabled real-complex out-of-place transforms and found that they should be using the conjugate even form, so I had to update the tests to account for that.

anantsrivastava30 · 2023-02-07T21:37:35Z

tests/unit_tests/dft/source/descriptor_tests.cpp

+        descriptor.get_value(oneapi::mkl::dft::config_param::LENGTHS, lengths_value.data());
+        descriptor.get_value(oneapi::mkl::dft::config_param::DIMENSION, &dimensions_after_set);
+
+        EXPECT_EQ(new_lengths, lengths_value);


We are testing if the opensource descriptor can set/get the Length but are we ensuring that is being communicated to the intel oneMKL descriptor
for instance setting LENGTHS is not allowed on GPU/CPU, and would resutl in a ERR_BAD_OPERATION
so either,

we remove the set length, I believe we do take care for other readonly config_params (this one is not explicitly mentioned in the spec),

in the initial implimentation get_value ensure that the value stored in the config_value is the same as the one returned by the intel oneMKL descriptor.

We are only testing the open source descriptor in this test. Whether the values are properly communicated to the backend can be tested when running the computational tests.
The backend descriptor is only created once the open-source descriptor is committed and the user cannot set values if the descriptor is already committed (see here) so there are no risks to have the LENGTHS read-write as the spec defines.

So you are saying that the compute tests passsing is indicative of a proper communitation to the backend desc?

And so what would happen if say I need to reset the strides of a real problem, and recommit before launching backward transform after running a forward transform (e.g); if we can't set value on a commited descriptor then how do we impliment a real outofplace roundtrip with no padded data?

Yes, I can't imagine a scenario where all tests pass without the backend getting the right configuration.

The case you are describing could work with 2 descriptors. This was our understanding from the spec: All calls to the set_value function to change configuration parameters of a descriptor need to happen after the constructor call for the descriptor class and before a call to commit. Maybe this is just confusing wording? Do we want to allow the user to set_value again as long it is committed again? Even if that is the case, the LENGTHS can still be writeable as we create a new backend descriptor when commit is called.

Thank you, yeah that sounds fair,

Good point the spec does say that but as you can see in the examples we can keep the same descriptor and reuse it, for multiple cases. We should not recreate the discriptor unless its out of scope and the unique_ptr naturally destroy's it self. More on Lenghts (specific to the C Api)
DFTI_DIMENSION and DFTI_LENGTHS do not have a default value. To set them, use the DftiCreateDescriptor function and not the DftiSetValue function. (ref)

We would prefer if we can update the open-source spec first if you need to make it closer to the closed-source MKL spec. I'm not seeing any real reason why we should change the current spec though. The descriptor likely needs to prepare again from scratch for any new commit, doesn't it? Also we're wondering if there could be complications if the user commits a second time with a different queue.

anantsrivastava30 · 2023-02-08T01:02:14Z

tests/unit_tests/dft/source/descriptor_tests.cpp

+        descriptor.get_value(oneapi::mkl::dft::config_param::BWD_DISTANCE,
+                             &bwd_distance_before_set);
+        EXPECT_EQ(1, bwd_distance_before_set);
+        descriptor.set_value(oneapi::mkl::dft::config_param::BWD_DISTANCE, bwd_distance_set_value);


Setting this to be the same as the forward distance with COMPLEX_CONJUGATE (for real inplace problems), would result in a inconsistent configuration error as the bwd_distance, should be half the fwd_distance.

We are testing real inplace transforms in a next MR. This sets the same value for forward and backward distance and passes. The spec only says that the 2 distances should be consistent. Are you talking about another case that we are not testing yet?

The fairly ambiguous "consistency" is a (hidden) reference to "For in-place transforms, ensure the first element of the input data has the same location as the first element of the output data for each dimension." from this page of our docs. For in-place complex transforms, this means equal forward and backward distances, indeed. However, for real transform with DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_COMPLEX, the (implicit) units used for strides and distances are real floating point values in forward domain and complex floating point values in backward domains (more details on the implicit "units" in either domain here). So, the backward distance must be half the forward distance in that case (a padding of two real floating point values is required in the forward domain).

Can you explain why the code @Rbiessy linked works then, even for cases with NUMBER_OF_TRANSFORMS > 1? Is BWD_DISTANCE simply ignored?

I believe the test lines linked by @Rbiessy works becasue the container size is setup correctly, I am talking about this tests specifically /DescriptorTestSuite/DescriptorTests.DescriptorTestsReal{Single|Double}/ we are setting up both the input and output distance as 12 (with a bunch of other parameters) and the eventual recipe is commited (line 344), that commit will fail, because of what @raphael-egan explanations.

In the open-source spec CONJUGATE_EVEN_STORAGE only supports COMPLEX_COMPLEX so this is implicitly set here. This makes sense since there is a note saying DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_REAL is deprecated (and will be removed for multi-dimensional FFTs).
We have noticed a couple of strange behaviors with the MKLGPU backend:

I was not able to query the backend's fwd and bwd distance. It throws an exception with FFT_INVALID_DESCRIPTOR for all of our tests. Is that expected?

It seems the backend ignores the value set for BWD_DISTANCE as our multi-batches test pass for any value we set. I can half the BWD_DISTANCE for real transforms but it seems there is backend's bug?

FFT_INVALID_DESCRIPTOR being returned when querying fwd/bwd distance used to be a bug on our side: MKLD-13592. It was fixed on 11/03/2022 for oneMKL2023.0 release I believe, in PR#2163;

regarding the BWD_DISTANCE value apparently ignored, I believe you are right although that depends on the problem configuration and the targeted GPU device (could you provide more details?). Unless you're testing an r2c of size 2*N <= 4096 on PVC, the default code path is taken and the bwd distance is indeed ignored for in-place operation in that case. It is instead coerced into the value that's required for consistency. I agree that it should at least check that the user did that as well and error out if not: this is part of something I have intended to address for quite a while now (MKLD-13390) but not in our top priorities so far unfortunately

Thank you for clarifying.

I am using the 2023.0 release, maybe it will be fixed for the next release?

We are testing some rather old integrated GPU (HD Graphics 530 in my case) and the configuration is real, in-place, single precision, size=2x2x2, batch=2, backward scale=1/8, fwd_distance = bwd_distance = 16. How can the backend force a bwd_distance? Isn't the user allowed to set a larger bwd_distance in such a case? This is what we are currently testing (by mistake) but the output doesn't change with any of the bwd_distance I tried (0, 8, 16, 60).

I have removed the commit of the descriptor for this particular to make it clearer that this is only testing the open-source descriptor (1961d9c) here.
I have halved the distance for this descriptor even if that shouldn't matter anymore (2aba40d).

My bad, I believe PR#2163 missed the code freeze by one day so the FFT_INVALID_DESCRIPTOR is to be expected with the 2023.0 release.

The bwd distance is enforced to half the fwd distance for real in-place operations at the moment (except for the more recent paths implemented recently which are taken only on PVC).

This being said, I'm still surprised you're not getting incorrect results for srfi2x2x22 with a fwd distance of 8... Shouldn't it be a forward distance of 16 in that case (backward distance of 8)? 22*(2*(2/2 + 1)) = 16, (a 3D in-place real-transform of size AxBxC requires some padding in forward domain, with a forward distance of AB(2*(C/2 + 1)) to accommodate the bwd domain's data in the same allocated chunk)

Sorry, I edited my previous message, the distances are set to 16 indeed. I'd like to discuss in our next meeting if we can clarify the situation for the users regarding bwd_distance. That conversation got a bit off topic!

anantsrivastava30

Changes look good

Rbiessy · 2023-02-15T15:52:11Z

@mkrainiuk @akabalov we are not able to hit the merge button. Are we waiting for something else before this can be merged?

mkrainiuk · 2023-02-15T16:48:24Z

@mkrainiuk @akabalov we are not able to hit the merge button. Are we waiting for something else before this can be merged?

Let me know if I can merge it now.

Rbiessy · 2023-02-15T16:51:45Z

Let me know if I can merge it now.

Yes please do, thank you!

* FFT_MKLCPU_staging + sync #261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * command group handles only the DftiCommitDescriptor * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * clang-format * fix ct adding both backends to the linkline * FFT_MKLCPU_staging + sync #261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * minor changes and error checks * command group handles only the DftiCommitDescriptor * move queue to pass by reference * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * fix ct adding both backends to the linkline * address reviews * move the derived class to its own file * address warnings, and change buffer type * Update src/dft/backends/mklcpu/commit.cpp * resovle dynamic dispatch issue * merge cmake for both device * remove bad file * removed old comment * clang-format

* FFT_MKLCPU_staging + sync uxlfoundation#261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * command group handles only the DftiCommitDescriptor * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * clang-format * fix ct adding both backends to the linkline * FFT_MKLCPU_staging + sync uxlfoundation#261 * fix real_real failures * tests for CPU backend * [DFT] fix rebase issues * [DFT] address reviews * minor changes and error checks * command group handles only the DftiCommitDescriptor * move queue to pass by reference * set the distance to true default * fix testing issue with complex_complex * change real_real test to work for md, batch transforms * adderssing minor comments * add error messages for ordering, transpose, workspace * change error message from cpu to gpu * move the desc_buffer to the commit class to avoid descructor invoke * refactor forward declared derived class * extend buffer lifetime for backward * add error messaging for compute * patchfix for transform in/out distances to fwd/bwd * revert resetting dist in roundtrip * revert tests to match specs * fix ct adding both backends to the linkline * address reviews * move the derived class to its own file * address warnings, and change buffer type * Update src/dft/backends/mklcpu/commit.cpp * resovle dynamic dispatch issue * merge cmake for both device * remove bad file * removed old comment * clang-format

Add unit-tests for the dft interface

3b26ba2

hjabird mentioned this pull request Dec 30, 2022

[DFT] Initial DFT work & MKLGPU backend #259

Merged

1 task

FMarno reviewed Jan 6, 2023

View reviewed changes

tests/unit_tests/dft/include/compute_tester.hpp Outdated Show resolved Hide resolved

tests/unit_tests/dft/include/reference_dft.hpp Outdated Show resolved Hide resolved

tests/unit_tests/dft/include/test_common.hpp Outdated Show resolved Hide resolved

FMarno reviewed Jan 10, 2023

View reviewed changes

FMarno requested changes Jan 12, 2023

View reviewed changes

FMarno and others added 4 commits January 13, 2023 12:47

increased accuracy in reference dft

a55e89e

Move DFT_Test function definitions into class definition

7b4ca33

Address stylistic comments

56d18e6

Clang-format; Stylistic adjustments; Remove compare vec unused arg.

a306d18

FMarno approved these changes Jan 25, 2023

View reviewed changes

Merge remote-tracking branch 'intel/develop' into HEAD

3999a5e

Rbiessy approved these changes Jan 26, 2023

View reviewed changes

t4c1 approved these changes Jan 27, 2023

View reviewed changes

hparks-intel reviewed Jan 30, 2023

View reviewed changes

raphael-egan reviewed Jan 30, 2023

View reviewed changes

hjabird added 4 commits January 31, 2023 10:03

Rename types to improve clarity

2fbfcb6

Use implicit data movement with buffer tests

d39fe50

Variable renaming to improve clarity

b18bbe6

Document reference DFT

774e322

mkrainiuk reviewed Jan 31, 2023

View reviewed changes

tests/unit_tests/include/test_helper.hpp Outdated Show resolved Hide resolved

tests/unit_tests/include/test_helper.hpp Show resolved Hide resolved

Rbiessy and others added 2 commits January 31, 2023 18:17

Set error margin based on problem size

75c0fc5

Throw exception on in.size() != out.size()

3cd6090

raphael-egan reviewed Jan 31, 2023

View reviewed changes

hjabird added 2 commits February 1, 2023 12:00

Fix incorrectly sized inout_host vector bug

6a6d3a7

Remove unused macros for TEST_RUN_XXX_SELECT_NO_ARGS

f7566e1

Comment on redundant information in Real-DFT out_host_ref

a67cc38

mkrainiuk approved these changes Feb 1, 2023

View reviewed changes

Rbiessy added 2 commits February 1, 2023 17:59

Remove num_components factor from the error_margin

4fdc370

Remove unused common functions

d4ab87a

Introduce different absolute and relative error bound

925efb7

FMarno mentioned this pull request Feb 2, 2023

[DFT] Multi-Dimension Many-Batch DFT tests #275

Merged

2 tasks

hparks-intel approved these changes Feb 2, 2023

View reviewed changes

raphael-egan approved these changes Feb 2, 2023

View reviewed changes

handle conjugate even form in real-complex out_of_place transforms

43f7758

anantsrivastava30 reviewed Feb 7, 2023

View reviewed changes

anantsrivastava30 reviewed Feb 8, 2023

View reviewed changes

Rbiessy added 2 commits February 9, 2023 16:52

Halve bwd_distance

2aba40d

Do not commit descriptor for descriptor test

1961d9c

FMarno mentioned this pull request Feb 10, 2023

[DFT] Introduce a cuFFT backend for the DFT interface #278

Closed

2 tasks

Fix bwd_distance

79b0374

anantsrivastava30 approved these changes Feb 13, 2023

View reviewed changes

akabalov approved these changes Feb 13, 2023

View reviewed changes

mkrainiuk merged commit 245fd11 into uxlfoundation:develop Feb 15, 2023

anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Feb 28, 2023

FFT_MKLCPU_staging + sync uxlfoundation#261

8f08ea7

anantsrivastava30 mentioned this pull request Mar 6, 2023

[DFT] Synced mklcpu backend + Real_Real tests #288

Merged

11 tasks

Rbiessy mentioned this pull request Mar 8, 2023

[DFT] Split descriptor tests that do not need a device #291

Merged

9 tasks

Rbiessy deleted the dft-tests branch March 9, 2023 11:18

anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Mar 13, 2023

FFT_MKLCPU_staging + sync uxlfoundation#261

68a98b1

anantsrivastava30 added a commit to anantsrivastava30/oneMKL that referenced this pull request Apr 20, 2023

FFT_MKLCPU_staging + sync uxlfoundation#261

51bb1e8

		EXPECT_TRUE(check_equal_vector(out_host.data(), out_host_ref_conjugate.data(),
		out_host.size(), error_margin, std::cout));

		std::vector<FwdInputType> inout_host(input);
		sycl::buffer<FwdInputType, 1> inout_buf{ inout_host.data(), sycl::range<1>(container_size) };

Comments

Conversation

fabiomestre commented Dec 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

All Submissions

Uh oh!

FMarno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hjabird Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hjabird Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rbiessy Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabiomestre commented Dec 30, 2022 •

edited

Loading

hjabird Jan 31, 2023 •

edited

Loading

hjabird Jan 31, 2023 •

edited

Loading

Rbiessy Jan 31, 2023 •

edited

Loading