Identify feature/options necessary to support OpenCL and OpenMP #14858

calderpg-tri · 2021-04-01T22:38:57Z

In service of #14431 we would like to add support for OpenCl (cross-platform GPU acceleration) and OpenMP (directive-based parallelization). Both of these dependencies add runtime components, which may be more (OpenMP) or less (OpenCL) problematic for Drake users.

OpenCL

#14843 adds OpenCL support for Ubuntu and Mac, used by a test in external voxelized_geometry_tools. We expect in the future that OpenCL will be used as part of planning code moved to Drake and thus be shipped in some/all binary forms of Drake. Broadly, our OpenCL uses the Installable Client Driver mechanism, by which our code links to the ICD loader and at runtime enumerates the available OpenCL platforms and devices. If no OpenCL platform/device is available our code will fall back to a different implementation, and thus Drake will not require OpenCL execution to be available.

Concerns/risks:

We believe that the runtime element should be minimal in the case of code that doesn't use OpenCL and that it shouldn't conflict with other software users want to integrate with Drake, but have not confirmed this yet (and doing so will require some feedback from the community).
The OpenCL execution model means that kernels are not compiled until run by a specific platform. So long as planning tools and externals are tested outside of Drake against a number of OpenCL implementations, we will not necessarily require that Drake CI support OpenCL execution (i.e. we don't need instances with GPUs). This could change in the future.
Apple has officially deprecated both OpenGL and OpenCL; however support for both continues to be available on Big Sur. Should this change in the future, we will need to remove support for OpenCL on Mac and potentially add additional Mac-specific implementation(s) of our planning tools.

OpenMP

OpenMP requires both compiler support and a runtime component. On Ubuntu platforms this is quite easy to integrate with a set of compile and link flags (although these flags differ somewhat between GCC and Clang). However, Apple does not provide the runtime library and partially disables OpenMP support in their compiler. At the very least, OpenMP support must be opt-out, whether or not it should be opt-in is a question.

Concerns/risks

OpenMP directives in our code interact with Eigen's own OpenMP integration. Conservatively, safe combination relies on the use of the EIGEN_DONT_PARALLELIZE define to disable Eigen's built-in uses.
OpenMP may interact or conflict with commercial solvers such as Gurobi and Mosek. We use OpenMP with Snopt internally and have patched interaction issues that arose, but have not extensively used it with the other commercial solvers. Mosek uses Cilk, which shouldn't directly conflict with OpenMP in terms of shared memory, but will definitely cause some sort of resource contention in the case someone puts a call to Mosek in the body of a #pragma omp parallel for loop.
If we want to add Mac support, doing so would require either a different compiler (i.e. GCC or upstream Clang from homebrew) or the use of the -Xclang option to Apple's compiler and a separately-provided release-specific version of the OpenMP runtime library.

CI and release implications

@jwnimmer-tri has enumerated some of the support matrix we'll need to consider, accounting for user channel and build options

User channel:

Source build (nightly, monthly)
GitHub binary tarball (nightly, monthly)
Homebrew binary cask (monthly)
Docker binary image (nightly, monthly)
Debian PPA binary w/sources (monthly)
Colab notebooks, likely via Debian PPA (monthly)

Build configs:

Gurobi on/off -- must be off for first-party binaries
Mosek on/off -- must be off for first-party binaries
Snopt on/off -- n.b. our first-party binaries turn this on, shrouded
Debug / Release / Coverage / Dynamic Analysis
Clang / GCC
Bionic / Focal / Catalina / Big Sur
OpenMP on/off
OpenCL on/off

We need to decide which channels will either support (or require) the various build option permutations and what coverage must exist in CI. I am putting together a survey to gather feedback of which combinations of channel/build should be supported and tested.

cc @ggould-tri @jwnimmer-tri @jamiesnape @sherm1

The text was updated successfully, but these errors were encountered:

EricCousineau-TRI · 2021-04-02T01:49:06Z

Moved from PR:

[...] but a quick sanity check probably is still worthwhile. (If it does have downsides, we might need the option to disable it.)

Perhaps OpenCV should be part of the checklist? (From ~~brief~~ shallow investigations here, I think it enables OpenCL by default; dunno about static vs. dynamic linking: repro/.../opencv_cvtcolor_slow)

calderpg-tri · 2021-04-02T16:49:38Z

From a basic test program that uses OpenCV and OpenCL, I don't see any problems combining OpenCV-with-OpenCL-enabled and OpenCL.

calderpg-tri · 2021-04-06T20:40:56Z

So long as planning tools and externals are tested outside of Drake against a number of OpenCL implementations

To elaborate, right now I manually test changes to the OpenCL implementations against Nvidia, AMD, and Intel platforms. Nvidia and AMD platforms are amenable to testing through AWS via something like G4ad (AMD) and G4dn (Nvidia) instances, but I'm not aware of any instances that use Intel GPUs.

jamiesnape · 2021-04-14T20:21:35Z

So long as planning tools and externals are tested outside of Drake against a number of OpenCL implementations

To elaborate, right now I manually test changes to the OpenCL implementations against Nvidia, AMD, and Intel platforms. Nvidia and AMD platforms are amenable to testing through AWS via something like G4ad (AMD) and G4dn (Nvidia) instances, but I'm not aware of any instances that use Intel GPUs.

Factoring in hardware and software revisions, we are nearing an intractable number of implementations. If we can support two implementations of a given version of OpenCL we are probably doing well. Realistically, the Intel version would be at the bottom of my heap of versions to test. Budget is going to play into what we test too. We have some slack, but there are only so many G4 variant instances we could run in a weekly cycle (I would prioritize NVIDIA, not least because they own ARM).

calderpg-tri · 2021-04-14T20:44:00Z

Factoring in hardware and software revisions, we are nearing an intractable number of implementations. If we can support two implementations of a given version of OpenCL we are probably doing well.

I don't think we need to plan around testing on a range of (hardware x software) revisions - the most important part of testing on multiple platforms is to confirm that the OpenCL kernels build and something platform/implementation-specific doesn't sneak in. I think we can achieve that fine with a single example each of AMD and Nvidia.

Realistically, the Intel version would be at the bottom of my heap of versions to test.

Unfortunately, it's quite possible that this is the most-used implementation due to laptops. That said, I've only run into an Intel implementation-specific issue once (an ambiguous call to sqrt), so I think for now we could require that the rare changes to OpenCL kernels get manually tested on Intel instead.

jamiesnape · 2021-04-14T20:51:28Z

I don't think we need to plan around testing on a range of (hardware x software) revisions...

Yes, I just wouldn't want anyone to get a false sense of security from a given AWS instance type. OpenGL is hard enough, let alone OpenCL.

Unfortunately, it's quite possible that this is the most-used implementation due to laptops.

True, but they probably have the least to gain from using OpenCL?

calderpg-tri · 2021-04-14T21:10:40Z

Unfortunately, it's quite possible that this is the most-used implementation due to laptops.

True, but they probably have the least to gain from using OpenCL?

I have seen pretty solid speedups on NUCs and laptops for pointcloud voxelization and roadmap updating with OpenCL, especially on machines with fewer cores.

jamiesnape · 2021-04-14T21:16:31Z

Cool, nice to be proven wrong. Are are there good gains with both the GPU and CPU implementations?

calderpg-tri · 2021-04-14T22:07:46Z

Are are there good gains with both the GPU and CPU implementations?

I haven't tried them against Intel's OpenCL-on-CPU implementation, only their two GPU implementations (older beignet and newer NEO/GCR) if that's what you're asking.

jamiesnape · 2021-04-15T13:30:58Z

Yes. FWIW That may be a configuration we can handle on AWS.

jwnimmer-tri · 2021-12-17T17:42:03Z

\CC @xuchenhan-tri FYI as this might relate to FEM simulations in the future as well.

DamrongGuoy · 2022-02-03T04:28:56Z

I can see this is a big change, but I believe it will open Drake to new fruitful opportunities. Cheers!

jwnimmer-tri · 2022-02-16T02:20:34Z

A few more notes from my digging...

For users who might use MKL's libblas (instead of Ubuntu libblas) at load-time, it seems like the obvious and good things will happen by default, and we can can rely on OpenMP to sort out the details, per the MKL Developer Guide.

Mosek currently uses Cilk for the thread pool, but

Mosek version 10 will no longer employ Cilk but most likely oneTBB. This will allow for a more fine grained control on threading.
-- https://groups.google.com/g/mosek/c/x2pZnW0OJEo

For background docs and good tips, see:

When solving, possibly we should detect if we're within an parallel section (per omp_in_parallel) and then set MSK_IPAR_INTPNT_MULTI_THREAD to OFF automatically, or maybe we should just document the caveat and let users configure what they need. Maybe in MOSEK 10 it will be easier.

Gurobi also consumes all threads on the machine by default:
https://www.gurobi.com/documentation/9.5/refman/threads.html

See also:
https://support.gurobi.com/hc/en-us/community/posts/360055837711-Solving-different-models-in-parallel-C-OpenMP-

I haven't yet found what kind of thread pool it's using.

calderpg-tri added component: distribution Nightly binaries, monthly releases, docker, installation component: continuous integration Jenkins, CDash, mirroring of externals, website infrastructure component: build system Bazel, CMake, dependencies, memory checkers, linters labels Apr 1, 2021

calderpg-tri self-assigned this Apr 1, 2021

EricCousineau-TRI mentioned this issue Apr 2, 2021

Add OpenCL support #14843

Merged

jwnimmer-tri added the unused team: kitware label Apr 2, 2021

jwnimmer-tri added priority: low and removed component: continuous integration Jenkins, CDash, mirroring of externals, website infrastructure component: build system Bazel, CMake, dependencies, memory checkers, linters labels Nov 10, 2021

jwnimmer-tri added priority: medium and removed priority: low labels Feb 16, 2022

jwnimmer-tri mentioned this issue Feb 16, 2022

[tools] Add opt-in build flags (and unit test) for OpenMP #16606

Merged

jwnimmer-tri removed the unused team: kitware label May 3, 2022

jwnimmer-tri mentioned this issue May 11, 2022

[tools] Add OpenMP to "Everything" CI config #17154

Merged

jwnimmer-tri mentioned this issue Feb 20, 2023

Enable OpenMP in monthly releases #18828

Closed

jwnimmer-tri added the type: feature request label May 22, 2023

jwnimmer-tri added this to #dynamics (Drake board) Feb 15, 2024

jwnimmer-tri moved this to Backlog in #dynamics (Drake board) Feb 15, 2024

jwnimmer-tri removed this from #dynamics (Drake board) Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify feature/options necessary to support OpenCL and OpenMP #14858

Identify feature/options necessary to support OpenCL and OpenMP #14858

calderpg-tri commented Apr 1, 2021

EricCousineau-TRI commented Apr 2, 2021

calderpg-tri commented Apr 2, 2021

calderpg-tri commented Apr 6, 2021

jamiesnape commented Apr 14, 2021 •

edited

Loading

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 14, 2021

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 14, 2021

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 15, 2021

jwnimmer-tri commented Dec 17, 2021

DamrongGuoy commented Feb 3, 2022

jwnimmer-tri commented Feb 16, 2022 •

edited

Loading

Identify feature/options necessary to support OpenCL and OpenMP #14858

Identify feature/options necessary to support OpenCL and OpenMP #14858

Comments

calderpg-tri commented Apr 1, 2021

OpenCL

Concerns/risks:

OpenMP

Concerns/risks

CI and release implications

EricCousineau-TRI commented Apr 2, 2021

calderpg-tri commented Apr 2, 2021

calderpg-tri commented Apr 6, 2021

jamiesnape commented Apr 14, 2021 • edited Loading

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 14, 2021

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 14, 2021

calderpg-tri commented Apr 14, 2021

jamiesnape commented Apr 15, 2021

jwnimmer-tri commented Dec 17, 2021

DamrongGuoy commented Feb 3, 2022

jwnimmer-tri commented Feb 16, 2022 • edited Loading

jamiesnape commented Apr 14, 2021 •

edited

Loading

jwnimmer-tri commented Feb 16, 2022 •

edited

Loading