Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Sapporo2 work with more device types #5

Open
rieder opened this issue Feb 4, 2020 · 10 comments
Open

Make Sapporo2 work with more device types #5

rieder opened this issue Feb 4, 2020 · 10 comments

Comments

@rieder
Copy link
Contributor

rieder commented Feb 4, 2020

Tracker issue - this currently doesn't always seem to work (at least on macOS - for which PR #4 is an initial fix), it would be nice if it did.
I will report on progress / problems here.

@rieder
Copy link
Contributor Author

rieder commented Feb 4, 2020

One bit of trouble on macOS is that OpenCL is deprecated there, in favour of Metal. OpenCL 1.2 still works, but newer versions are not supported. Not sure if there is a workaround for this.
For Linux this should not be a problem.

@rieder rieder changed the title Make Sapporo2 work with OpenCL on macOS Make Sapporo2 work with OpenCL Feb 4, 2020
@ymeiron
Copy link

ymeiron commented Feb 17, 2021

I'm on Linux and having trouble with the Sapporo/OpenCL, the output is:

sapporo2::open - no config file is found 
Integration order used: 1 (0=GRAPE5, 1=4th, 2=6th, 3=8th)
Integration precision used: 1 (0=FLOAT, 1 = DOUBLESINGLE, 2=DOUBLE)
Getting list of OpenCL devices ...
0: AMD Accelerated Parallel Processing
Using platform 0 
Found 1 suitable devices: 
0: gfx906      Vendor: Advanced Micro Devices, Inc.
Number of cpus available: 96
Number of gpus available: 1
integrationOrder : 1
Getting list of OpenCL devices ...
0: AMD Accelerated Parallel Processing
Using platform 0 
Found 1 suitable devices: 
0: gfx906      Vendor: Advanced Micro Devices, Inc.
Using device: 0
Device has: 60   multiprocessors 
Using  2 blocks per multi-processor for a total of : 120
Loading file:  OpenCL/kernels4th.cl 
Opening kernel file: OpenCL/kernels4th.cl
Found compiled in version of file: OpenCL/kernels4th.cl
Loading file:  OpenCL/kernels4th.cl 
Opening kernel file: OpenCL/kernels4th.cl
Found compiled in version of file: OpenCL/kernels4th.cl
Loading file:  OpenCL/kernels4th.cl 
Opening kernel file: OpenCL/kernels4th.cl
Found compiled in version of file: OpenCL/kernels4th.cl
Loading file:  OpenCL/kernels4th.cl 
Opening kernel file: OpenCL/kernels4th.cl
Found compiled in version of file: OpenCL/kernels4th.cl
Kernel files found .. building compute kernels! 
Creating kernel dev_copy_particles 
Maximum work group size: 256 Optimal work group multiple: 64 
Creating kernel dev_predictor 
Maximum work group size: 256 Optimal work group multiple: 64 
Creating kernel dev_evaluate_gravity_fourth_DS 
Maximum work group size: 256 Optimal work group multiple: 64 
Creating kernel dev_reset_buffers 
Maximum work group size: 256 Optimal work group multiple: 64 
oclSafeCall() Runtime API error in file <./include/ocldev.h>, line 885 : Invalid work group size
. Kernel name: dev_evaluate_gravity_fourth_DS

I checked at the offending line, and the kernel launch that fails has global_work_size of 30720 and local_work_size of 256. I don't know much about OpenCL and how to find the maximum work group size and tell Sapporo not to pass it. Any help is appreciated. The device is AMD Radeon MI50 and I believe it supports OpenCL 2.0.

@jbedorf
Copy link
Collaborator

jbedorf commented Feb 17, 2021

I'm not familiar with that device, nor what the optimum settings are. But it looks like too many blocks are launched. What you can try is to change this line, to look as follows:
sapdevice->evalgravKernelTemplate.setWork_threadblock2D(p, q, 60, 1); //Default

Or change the NTHREAD values here.

It might require some trial and error to get that right and work with your device.

@rieder
Copy link
Contributor Author

rieder commented Nov 12, 2021

This issue seems increasingly relevant, with other GPUs than Nvidia-build ones becoming more prominent (e.g. Apple's M1 series processors).
Should we write a proposal to work on this? Would you have any interest in this, @spzwart, @stevemcmillan?

@rieder rieder changed the title Make Sapporo2 work with OpenCL Make Sapporo2 work with Vulkan May 2, 2022
@rieder
Copy link
Contributor Author

rieder commented May 2, 2022

Renamed the issue - I think adding Vulkan support would be a great goal, since this is the most supported GPU language (also supported on macOS via MoltenVK which translates it to Metal).
I still don't know who could do this, but it would be a real nice thing to have!

@rieder
Copy link
Contributor Author

rieder commented Apr 17, 2023

Maybe Sycl is the way to go these days? https://sycl.tech

@rieder rieder changed the title Make Sapporo2 work with Vulkan Make Sapporo2 work with more GPU devices Apr 17, 2023
@rieder rieder changed the title Make Sapporo2 work with more GPU devices Make Sapporo2 work with more device types Apr 17, 2023
@ymeiron
Copy link

ymeiron commented Apr 17, 2023

It is for sure if Sapporo is to take advantage of upcoming Intel HPC GPUs. There's also SYCLomatic that's supposed to be helpful converting CUDA to SYCL, but I bet it won't be too easy for codes like Sapporo that (if I remember correctly) use the CUDA driver (as opposed to runtime) API.

@rieder
Copy link
Contributor Author

rieder commented Apr 26, 2023

Probably not easy no. But I think it's essential if we want to use Sapporo in the future.

@rieder
Copy link
Contributor Author

rieder commented May 1, 2023

I was discussing migrating Sapporo to using SYCL with Kentaro Nomura (now at Intel, formerly at RIKEN), perhaps he can help us with this.

@LourensVeen
Copy link

AMD now has HIP, which is essentially a clone of the CUDA API backed by either CUDA (if you have nVidia hardware) or ROCm (if you have AMD). Easy to port supposedly, but support for other platforms is an open question.

Kokkos also looks interesting. It takes a pure C++ approach, and has a variety of backends, although I can't find one for Metal. It does apparently give you less low-level control than SYCL, but with the resources we have, that's probably fine. It also involves writing modern C++, which is a good idea but may require some learning.

Still doesn't look like there's a clear winner...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants