-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cufinufft compiles with CUDA 12 on Windows #577
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea and good work. Only some changes.
#define WIN32_LEAN_AND_MEAN | ||
#define NOMINMAX | ||
#include <windows.h> | ||
#else | ||
#include <sys/time.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just use chrono as per CPU version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should share the timer implementation with finufft.
|
||
#include <cuda.h> | ||
#include <type_traits> | ||
|
||
#include <thrust/extrema.h> | ||
|
||
#define _USE_MATH_DEFINES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe checking that is not defined? can it cause double definition warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add it.
@@ -118,8 +131,8 @@ template<typename T> T infnorm(int n, std::complex<T> *a) { | |||
*/ | |||
|
|||
template<typename T> | |||
static __forceinline__ __device__ void atomicAddComplexShared(cuda_complex<T> *address, | |||
cuda_complex<T> res) { | |||
static __forceinline__ __device__ void atomicAddComplexShared( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the clang format changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the clang-format change
@@ -22,8 +22,7 @@ set(CUFINUFFT_INCLUDE_DIRS | |||
${PROJECT_SOURCE_DIR}/include | |||
${PROJECT_SOURCE_DIR}/contrib | |||
$<TARGET_PROPERTY:CUDA::cudart,INTERFACE_INCLUDE_DIRECTORIES> | |||
$<TARGET_PROPERTY:CUDA::cufft,INTERFACE_INCLUDE_DIRECTORIES> | |||
$<TARGET_PROPERTY:CUDA::nvToolsExt,INTERFACE_INCLUDE_DIRECTORIES>) | |||
$<TARGET_PROPERTY:CUDA::cufft,INTERFACE_INCLUDE_DIRECTORIES>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this deletion cause compilation issues on older cuda versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about which header from CUDA::nvToolsExt
is included, do you have CUDA 11 machine to test it?
@@ -25,7 +25,14 @@ CUFINUFFT_BIGINT next235beven(CUFINUFFT_BIGINT n, CUFINUFFT_BIGINT b) | |||
|
|||
// ----------------------- helpers for timing (always stay double prec)... | |||
|
|||
void CNTime::start() { gettimeofday(&initial, 0); } | |||
void CNTime::start() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpu library is already windows compatible in this case.
I suggest moving the CPU utils in a global folder that can be included by both (cpu and cuda) and using that. We could wrap all that functionality in a library that is linked by both as the reason for this duplication is that cuFINUFFT was in a separate repo before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind putting the timer implementation in the util header of finufft and using it directly in cufinufft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather move to a separate lib. Once we move the cpu library to templates we plan on grouping the duplicated code anyway.
I make the following changes to make cufinufft compilable with CUDA 12 on Windows
windows.h
to replacesys/time.h
on Windows._USE_MATH_DEFINES
to ensureM_PI
is defined.CUDA::nvToolsExt
in CMake file because it does not exist in CUDA 12, see link1 and link2.cufinufft
in CMake, because originalcufinufft
just merges two object libraries to a static or shared library. It will lead to linking error for MSVC if a library with no.cu
file links multiple libraries with CUDA. It is a bug reported here and there.