Add initial OpenCV CUDA bindings #416

SamCarlberg · 2017-05-05T00:29:45Z

Creates bindings for:

Header	Class
`opencv2/core/cuda.hpp`	`opencv_cuda`
`opencv2/cudaimgproc.hpp`	`opencv_cudaimgproc`
`opencv2/cudafilters.hpp`	`opencv_cudafilters`

Addresses (but doesn't solve) #364, bytedeco/javacv#481

Bindings for opencv2/cuda.hpp, opencv2/cudaimgproc.hpp, opencv2/cudafilters.hpp Update cppbuild.sh to use all available cores (was hardcoded to 4) Removes the generated opencv_core.Stream class, but that was generated from a forward declaration in the header file so it wasn't useful anyway

saudet · 2017-05-05T10:13:54Z

Looking good thanks! There are few things to fix before we can merge this though:

Since CUDA is available only on some platforms, we need to list them in the annotations. Something like the following should work:

@Platform(value = {"linux-x86_64", "linux-arm64", "linux-ppc64le", "macosx-x86_64", "windows-x86_64"}, ...

And so -DWITH_CUDA=ON cannot be used on other platforms in the cppbuild.sh script.
The content of opencv_photo and opencv_shape changes, I wonder why...?

SamCarlberg · 2017-05-05T17:22:31Z

I think opencv_photo and opencv_shape were changed because I removed the generated class for the forward declaration of cv::cuda::Stream in opencv_core I guess I could change those presets to inherit from opencv_cuda to get the real class declaration.

saudet · 2017-05-06T01:13:48Z

I don't think that's it, all the other modules don't change. Anyway, I'll look into that. Make the other changes and it'll be fine! One other thing though, to reduce build times, we should disable modules we don't use...

CUDA remains enabled for linux x86_64, linux arm64, linux ppc64le, mac os, windows x86_64

SamCarlberg · 2017-05-06T20:28:04Z

I'm not too happy about stuff like this:

public native void detectMultiScale(@ByVal Mat image,
                                    @ByVal Mat objects);
public native void detectMultiScale(@ByVal UMat image,
                                    @ByVal UMat objects);
public native void detectMultiScale(@ByVal GpuMat image,
                                    @ByVal GpuMat objects);

that should really only accept GpuMat parameters like this:

public native void detectMultiScale(@ByVal GpuMat image,
                                    @ByVal GpuMat objects);

FWIW, this method is generated from this declaration in opencv/cudaobjdetect.hpp:

virtual void detectMultiScale(InputArray image,
                              OutputArray objects,
                              Stream& stream = Stream::Null()) = 0;

But InputArray and OutputArray need to be aliased as Mat or UMat for some conversion code, as well as the methods needed for uploading and downloading mats in host memory to device memory (GpuMat::upload and GpuMat::download). What would be a good solution to this problem?

saudet · 2017-05-07T00:05:05Z

We can remove "Mat" and "UMat" from the redefinition, or do we need them somewhere? If it's just one place or two, we can use Info.javaText() for those.

BTW, the build is failing, on Windows:

Failed to execute JavaCPP Builder: Could not parse "opencv2\cudafilters.hpp": File does not exist

saudet · 2017-05-11T00:36:24Z

Ouch, it looks like OpenCV builds for all these archs by default:

--     NVIDIA GPU arch:             20 30 35 37 50 52 60 61

Limiting that to say 30 will reduce compile time by 8. And we could also use CUDA_ARCH_PTX to create binaries that will still work on all architectures.

saudet · 2017-05-11T00:38:37Z

/cc @vb216 It looks like it's failing on AppVeyor because CUDA isn't installed when building OpenCV.

SamCarlberg · 2017-05-11T00:42:58Z

Limiting that to say 30 will reduce compile time by 8. And we could also use CUDA_ARCH_PTX to create binaries that will still work on all architectures.

That sounds good. I'll try tackling that later this week.

Currently requires custom CUDA bindings from javaccp-presets (see bytedeco/javacpp-presets#416) Trying to use CUDA-accelerated operations without an nvidia GPU and running nvidia drivers will probably crash the app

This should speed up compilation of the CUDA modules by about 6x and cut down the size of the output jar by about 60% (vs all CUDA platorms being buit)

SamCarlberg · 2017-05-13T22:16:25Z

21c4f4d only compiles the CUDA stuff for 3.0 + PTX. Everything should still work, but I had some weird issues with double frees by the PTX JIT compiler. Full dump here. I'm not sure how robust the PTX stuff is.

saudet · 2017-05-14T01:33:35Z

Thanks! Now the build finishes on time for Linux, but still not for Mac :/

@vb216 is looking into stages, yes. It won't work for Windows with AppVeyor though :(

So, recent versions of CUDA have issues compiling older PTX versions? That's unfortunate...

SamCarlberg · 2017-05-14T02:20:30Z

Looks like the mac build was cancelled because the log got too big from warnings about unused parameters and functions in the core and cudev modules (mostly from stubs as far as I can tell)

And it's a problem with the PTX JIT. Caused that double free and probably a bunch of other issues I'm randomly getting when the cuda module gets loaded. Never had any issues with the build my machine uses natively (sm_50, GTX 950m)

saudet · 2017-05-16T05:29:27Z

@vb216 What do you think? Should we merge this?

vb216 · 2017-05-16T06:15:47Z

Yeah fine with me. Nearly got nexus AWS instance working which would let us rejig the jobs to make up some time too

…

On 16 May 2017 06:29, "Samuel Audet" ***@***.***> wrote: @vb216 <https://github.com/vb216> What do you think? Should we merge this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#416 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMYRJwBEZW4iwOWMaEWFZZsD6D-H-maUks5r6TQ4gaJpZM4NRX2h> .

saudet · 2017-05-16T07:34:07Z

I see why some modules like opencv_photo change. It's because we need to update their dependencies. In the case of opencv_photo, it's easy. We just need to replace opencv_imgproc.class by opencv_cudaimgproc.class, but others like opencv_stitching depend on modules that are not currently mapped:

$ readelf -d libopencv_stitching.so.3.2 | grep libopencv_cuda
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudafeatures2d.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudalegacy.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudawarping.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudaimgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudafilters.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudaarithm.so.3.2]

We'll need to add those to the preload list so they get properly loaded...

saudet · 2017-05-31T08:39:15Z

Not sure how I missed that, but when built with CUDA, it looks like all libraries become dependent on CUDA! Check the output of ldd libopencv_core.so | grep cu:

libopencv_cudev.so.3.2 => .../javacpp-presets/opencv/cppbuild/linux-x86_64/lib/libopencv_cudev.so.3.2 (0x00007fc1b5277000)
libcudart.so.8.0 => /usr/local/cuda/lib64/libcudart.so.8.0 (0x00007fc1b49bf000)
libnppc.so.8.0 => /usr/local/cuda/lib64/libnppc.so.8.0 (0x00007fc1b474f000)
libnppi.so.8.0 => /usr/local/cuda/lib64/libnppi.so.8.0 (0x00007fc1add2b000)
libnpps.so.8.0 => /usr/local/cuda/lib64/libnpps.so.8.0 (0x00007fc1ad335000)
libcufft.so.8.0 => /usr/local/cuda/lib64/libcufft.so.8.0 (0x00007fc1a44e6000)

We can't have that. How should we deal with this?

SamCarlberg · 2017-06-30T00:21:13Z

@saudet From my understanding, most modules will compile with CUDA support if the flag is set. If OpenCV loads the CUDA library only when a CUDA function is called (ie not at startup), like how Java does it, then I think it's okay. But if not, I think there would have to be separate builds for CUDA and non-CUDA versions

saudet · 2017-06-30T00:27:54Z

All the libraries link statically with CUDA, so users would have to install it. It's ok for something like Caffe where most users use a GPU anyway, but that's not the case with OpenCV.

jcfrk101 · 2017-08-01T06:08:43Z

Has anyone decided how to proceed? A separate CUDA build or not?

saudet · 2017-08-01T06:58:41Z

We'll need another build, yes. But to make releases, I'd like to be able to use Travis CI and AppVeyor. Could you help with that?

saudet · 2017-09-08T02:39:40Z

I've added an "extensions" feature to JavaCPP to allow us to have more than one set of binaries per platform: bytedeco/javacpp@35463a6. We still need to modify the pom.xml files of the presets a bit, but it should work as follows:

Execute the build as usual, "mvn install"
Reexecute once more, but with the new "extension" plugin property set to say "-cuda". (This will build or pick up libraries in cppbuild/<platform>-cuda subdirectories.), say "mvn install -Djavacpp.extension=-cuda".

Furthermore, we should be able to create Maven profiles in the parent pom.xml to override the list of modules to only list the submodules that support say "cuda", and another list for say "avx512", etc.

At runtime, JavaCPP will look at the @Platform(extensions={"-cuda", "-avx512", ...} ... annotation value, and try to load them one by one, only trying the default one without extension name at the end if all else fails. The idea is that users wanting to use the "-cuda" one would have to put that artifact in their class path manually, if not only the default one would get added by the "-platform" artifact.

Thoughts?

/cc @akdeoras

SamCarlberg · 2017-09-08T15:23:38Z

That's probably the best solution since OpenCV will break if built with CUDA and is loaded on a system that doesn't have CUDA support.

Side note, Travis has beta support for build stages now so it should be possible to make the mac builds not time out any more. https://docs.travis-ci.com/user/build-stages/

saudet · 2017-09-08T21:00:45Z

I seem to recall there was still an issue with using stages and that's why we're not using them, @vb216 ? Also, it's not available on AppVeyor...

vb216 · 2017-09-09T08:14:40Z

Yeah problem is that there's no means of sharing output between build stages, there's issues open with Travis and Appveyor on this as alot of people are keen on that functionality.

Thinking of using S3 to upload build output to, making it available to dependencies and then splitting out the long opencv chain.

saudet · 2017-12-19T11:16:47Z

Thanks for the contribution @SamCarlberg! Let me know if you encounter any problems using those functions.

Currently requires custom CUDA bindings from javaccp-presets (see bytedeco/javacpp-presets#416) Trying to use CUDA-accelerated operations without an nvidia GPU and running nvidia drivers will probably crash the app

SamCarlberg added 2 commits May 4, 2017 20:19

Update cppbuild.sh to use MAKEJ flag

0c41db7

SamCarlberg added 2 commits May 6, 2017 15:35

-DWITH_CUDA=OFF for linux x86/arm/armhf, windows x86

f7d048d

CUDA remains enabled for linux x86_64, linux arm64, linux ppc64le, mac os, windows x86_64

Add cudaobjdetect and cudaoptflow

bc6f316

Target CUDA compute 3.0

21c4f4d

This should speed up compilation of the CUDA modules by about 6x and cut down the size of the output jar by about 60% (vs all CUDA platorms being buit)

Fix OpenCV build with CUDA and some versions of CMake on Linux

eec1075

SamCarlberg and others added 2 commits May 16, 2017 13:40

Add bindings for cudaarithm

f9da02f

Work around issue with NVCC and GCC 6

502fc3a

saudet mentioned this pull request Jun 3, 2017

tensorflow-1.0.0-1.3 requires CUDA #396

Closed

saudet force-pushed the master branch 3 times, most recently from a1fa858 to 22f9ebd Compare September 24, 2017 11:23

saudet mentioned this pull request Dec 6, 2017

Performance problem with UMat #488

Closed

saudet mentioned this pull request Dec 18, 2017

CUDA openCV 3.1 Object Detection code unavailable in JavaCV 1.2 bytedeco/javacv#481

Closed

saudet added 2 commits December 18, 2017 16:52

Merge branch 'master' into opencv-cuda

8c99101

Fix and update CUDA bindings for OpenCV 3.3.1

1709496

saudet merged commit 7443377 into bytedeco:master Dec 19, 2017

SamCarlberg deleted the opencv-cuda branch January 18, 2018 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial OpenCV CUDA bindings #416

Add initial OpenCV CUDA bindings #416

SamCarlberg commented May 5, 2017 •

edited

Loading

saudet commented May 5, 2017

SamCarlberg commented May 5, 2017

saudet commented May 6, 2017 via email

SamCarlberg commented May 6, 2017 •

edited

Loading

saudet commented May 7, 2017

saudet commented May 11, 2017

saudet commented May 11, 2017

SamCarlberg commented May 11, 2017

SamCarlberg commented May 13, 2017

saudet commented May 14, 2017

SamCarlberg commented May 14, 2017 •

edited

Loading

saudet commented May 16, 2017

vb216 commented May 16, 2017 via email

saudet commented May 16, 2017

saudet commented May 31, 2017

SamCarlberg commented Jun 30, 2017

saudet commented Jun 30, 2017

jcfrk101 commented Aug 1, 2017

saudet commented Aug 1, 2017 via email

saudet commented Sep 8, 2017

SamCarlberg commented Sep 8, 2017

saudet commented Sep 8, 2017

vb216 commented Sep 9, 2017

saudet commented Dec 19, 2017

Add initial OpenCV CUDA bindings #416

Add initial OpenCV CUDA bindings #416

Conversation

SamCarlberg commented May 5, 2017 • edited Loading

saudet commented May 5, 2017

SamCarlberg commented May 5, 2017

saudet commented May 6, 2017 via email

SamCarlberg commented May 6, 2017 • edited Loading

saudet commented May 7, 2017

saudet commented May 11, 2017

saudet commented May 11, 2017

SamCarlberg commented May 11, 2017

SamCarlberg commented May 13, 2017

saudet commented May 14, 2017

SamCarlberg commented May 14, 2017 • edited Loading

saudet commented May 16, 2017

vb216 commented May 16, 2017 via email

saudet commented May 16, 2017

saudet commented May 31, 2017

SamCarlberg commented Jun 30, 2017

saudet commented Jun 30, 2017

jcfrk101 commented Aug 1, 2017

saudet commented Aug 1, 2017 via email

saudet commented Sep 8, 2017

SamCarlberg commented Sep 8, 2017

saudet commented Sep 8, 2017

vb216 commented Sep 9, 2017

saudet commented Dec 19, 2017

SamCarlberg commented May 5, 2017 •

edited

Loading

SamCarlberg commented May 6, 2017 •

edited

Loading

SamCarlberg commented May 14, 2017 •

edited

Loading