-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify feature/options necessary to support OpenCL and OpenMP #14858
Comments
Moved from PR:
Perhaps OpenCV should be part of the checklist? (From |
From a basic test program that uses OpenCV and OpenCL, I don't see any problems combining OpenCV-with-OpenCL-enabled and OpenCL. |
To elaborate, right now I manually test changes to the OpenCL implementations against Nvidia, AMD, and Intel platforms. Nvidia and AMD platforms are amenable to testing through AWS via something like G4ad (AMD) and G4dn (Nvidia) instances, but I'm not aware of any instances that use Intel GPUs. |
Factoring in hardware and software revisions, we are nearing an intractable number of implementations. If we can support two implementations of a given version of OpenCL we are probably doing well. Realistically, the Intel version would be at the bottom of my heap of versions to test. Budget is going to play into what we test too. We have some slack, but there are only so many G4 variant instances we could run in a weekly cycle (I would prioritize NVIDIA, not least because they own ARM). |
I don't think we need to plan around testing on a range of (hardware x software) revisions - the most important part of testing on multiple platforms is to confirm that the OpenCL kernels build and something platform/implementation-specific doesn't sneak in. I think we can achieve that fine with a single example each of AMD and Nvidia.
Unfortunately, it's quite possible that this is the most-used implementation due to laptops. That said, I've only run into an Intel implementation-specific issue once (an ambiguous call to |
Yes, I just wouldn't want anyone to get a false sense of security from a given AWS instance type. OpenGL is hard enough, let alone OpenCL.
True, but they probably have the least to gain from using OpenCL? |
I have seen pretty solid speedups on NUCs and laptops for pointcloud voxelization and roadmap updating with OpenCL, especially on machines with fewer cores. |
Cool, nice to be proven wrong. Are are there good gains with both the GPU and CPU implementations? |
I haven't tried them against Intel's OpenCL-on-CPU implementation, only their two GPU implementations (older |
Yes. FWIW That may be a configuration we can handle on AWS. |
\CC @xuchenhan-tri FYI as this might relate to FEM simulations in the future as well. |
I can see this is a big change, but I believe it will open Drake to new fruitful opportunities. Cheers! |
A few more notes from my digging... For users who might use MKL's libblas (instead of Ubuntu libblas) at load-time, it seems like the obvious and good things will happen by default, and we can can rely on OpenMP to sort out the details, per the MKL Developer Guide. Mosek currently uses Cilk for the thread pool, but
For background docs and good tips, see:
When solving, possibly we should detect if we're within an parallel section (per omp_in_parallel) and then set MSK_IPAR_INTPNT_MULTI_THREAD to OFF automatically, or maybe we should just document the caveat and let users configure what they need. Maybe in MOSEK 10 it will be easier. Gurobi also consumes all threads on the machine by default: I haven't yet found what kind of thread pool it's using. |
In service of #14431 we would like to add support for OpenCl (cross-platform GPU acceleration) and OpenMP (directive-based parallelization). Both of these dependencies add runtime components, which may be more (OpenMP) or less (OpenCL) problematic for Drake users.
OpenCL
#14843 adds OpenCL support for Ubuntu and Mac, used by a test in external
voxelized_geometry_tools
. We expect in the future that OpenCL will be used as part of planning code moved to Drake and thus be shipped in some/all binary forms of Drake. Broadly, our OpenCL uses the Installable Client Driver mechanism, by which our code links to the ICD loader and at runtime enumerates the available OpenCL platforms and devices. If no OpenCL platform/device is available our code will fall back to a different implementation, and thus Drake will not require OpenCL execution to be available.Concerns/risks:
We believe that the runtime element should be minimal in the case of code that doesn't use OpenCL and that it shouldn't conflict with other software users want to integrate with Drake, but have not confirmed this yet (and doing so will require some feedback from the community).
The OpenCL execution model means that kernels are not compiled until run by a specific platform. So long as planning tools and externals are tested outside of Drake against a number of OpenCL implementations, we will not necessarily require that Drake CI support OpenCL execution (i.e. we don't need instances with GPUs). This could change in the future.
Apple has officially deprecated both OpenGL and OpenCL; however support for both continues to be available on Big Sur. Should this change in the future, we will need to remove support for OpenCL on Mac and potentially add additional Mac-specific implementation(s) of our planning tools.
OpenMP
OpenMP requires both compiler support and a runtime component. On Ubuntu platforms this is quite easy to integrate with a set of compile and link flags (although these flags differ somewhat between GCC and Clang). However, Apple does not provide the runtime library and partially disables OpenMP support in their compiler. At the very least, OpenMP support must be opt-out, whether or not it should be opt-in is a question.
Concerns/risks
OpenMP directives in our code interact with Eigen's own OpenMP integration. Conservatively, safe combination relies on the use of the
EIGEN_DONT_PARALLELIZE
define to disable Eigen's built-in uses.OpenMP may interact or conflict with commercial solvers such as Gurobi and Mosek. We use OpenMP with Snopt internally and have patched interaction issues that arose, but have not extensively used it with the other commercial solvers. Mosek uses Cilk, which shouldn't directly conflict with OpenMP in terms of shared memory, but will definitely cause some sort of resource contention in the case someone puts a call to Mosek in the body of a
#pragma omp parallel for
loop.If we want to add Mac support, doing so would require either a different compiler (i.e. GCC or upstream Clang from homebrew) or the use of the
-Xclang
option to Apple's compiler and a separately-provided release-specific version of the OpenMP runtime library.CI and release implications
@jwnimmer-tri has enumerated some of the support matrix we'll need to consider, accounting for user channel and build options
User channel:
Build configs:
We need to decide which channels will either support (or require) the various build option permutations and what coverage must exist in CI. I am putting together a survey to gather feedback of which combinations of channel/build should be supported and tested.
cc @ggould-tri @jwnimmer-tri @jamiesnape @sherm1
The text was updated successfully, but these errors were encountered: