Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

classification segmentation fault when calling Caffe::SetDevices #99

Closed
kindloaf opened this issue Apr 18, 2017 · 14 comments
Closed

classification segmentation fault when calling Caffe::SetDevices #99

kindloaf opened this issue Apr 18, 2017 · 14 comments
Assignees
Labels

Comments

@kindloaf
Copy link

I'm testing Caffe with OpenCL on an android device. When I run the program classification, there was a segmentation fault.
Here is how I compiled and ran the program:

ck compile program:caffe-classification-opencl --target_os=android21-arm64
ck run program:caffe-classification-opencl --target_os=android21-arm64

Here is the information of the segmentation fault:

Stack frame #03 pc 00000000007ecb50  /data/local/tmp/libOpenCL.so (clCreateProgramWithBinary+268)
Stack frame #04 pc 00000000004a4c30  /data/local/tmp/tmp/libcaffe.so (viennacl::ocl::context::add_program(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+5040)
Stack frame #05 pc 000000000049dd2c  /data/local/tmp/tmp/libcaffe.so (caffe::RegisterKernels(viennacl::ocl::context*)+2032)
Stack frame #06 pc 000000000049c130  /data/local/tmp/tmp/libcaffe.so (caffe::device::SetProgram()+28)
Stack frame #07 pc 000000000049c290  /data/local/tmp/tmp/libcaffe.so (caffe::device::Init()+256)
Stack frame #08 pc 000000000048d240  /data/local/tmp/tmp/libcaffe.so (caffe::Caffe::SetDevices(std::vector<int, std::allocator<int> >)+3264)
Stack frame #09 pc 000000000001c81c  /data/local/tmp/tmp/classification (Classifier::Classifier(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+404)
Stack frame #10 pc 000000000001ec4c  /data/local/tmp/tmp/classification (main+728)

Any advice?

@DVEfremov
Copy link
Contributor

Have you tried to run it under valgrind ?

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

By the way, what is the device?

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

And Android NDK version?

@DVEfremov
Copy link
Contributor

I use valgrind compiled for my device (ARV-v7a) as described here
http://valgrind.org/docs/manual/dist.readme-android.html

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

I will try to rebuild/run clean ck-caffe version soon on my Samsung S7 ...

@kindloaf
Copy link
Author

@DVEfremov I have not tried valgrind. I will update this thread after I use valgrind.

@kindloaf
Copy link
Author

@gfursin
The NDK version is r14b.
By the way, the libOpenCL.so is provided by the device. By calling clGetPlatformInfo function with CL_PLATFORM_VERSION, it seems that the .so file was compiled with OpenCL 1.2.
From the log of ck-caffe, I assume the version is consistent:

Found OpenCL include: /home/.../CK-TOOLS/lib-opencl-stubs-1.2-android-ndk-4.9.x-android21-arm64/include

@kindloaf
Copy link
Author

@gfursin
By the way, in the thread of sh1r0/caffe-android-lib#23
For the command line to compile / run the program, did you mean program:caffe-classification-opencl instead of program:caffe-classification?

@kindloaf
Copy link
Author

I just found the issue: there was a file /data/local/tmp/viennacl_cache_0f45121d68e15d6052d1a913db3647b1fc0fc609 generated after running classification. When I removed the file, the segmentation fault is gone.
If the file is there, the program would crash.

Not sure what's the real culprit, but it solved my problem now.

Thanks for quick reply - I will close the issue.

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

This file is related to OpenCL kernel caching via ViannaCL. After deleting this file and running classification again, do you see a newly generated viennacl_cache_{some hash} file?
If you run classification several times now, does it still work?
I am asking because it looks like kernels have changed but were not recompiled on your system. I put @psyhtest in the CC since he was trying to improve ViennaCL caching mechanism ...

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

Also, we may need to add an option in the CK to clear such cache files (or maybe remove them automatically during ck-caffe reinstallation for Android) - I need to think about that ...

@kindloaf
Copy link
Author

kindloaf commented Apr 18, 2017

@gfursin and @psyhtest What I did was the following:
(1) Used ck run program:caffe-classification-opencl --target_os=android21-arm64 to run the test program. For the very first time after compilation, the run was successful.
(2) When I invoked ck run again, or used adb shell ... to launch the test program, it segmented faulted.
(3) Then I removed cache file, and used adb shell... to launch the test program, it succeeded. A new cache file was generated.
(4) I repeated (3) a few times. My observation is that if I didn't remove the cache file before launching the test program, it would always segmentation faulted.

@gfursin
Copy link
Contributor

gfursin commented Apr 18, 2017

Thanks a lot for reporting - that's quite strange though. I will test it tonight on my S7.
By the way, if you by chance will have some time, you may try to run this app:
https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments
There, you can select Caffe OpenCL and try to run it several times for example using GoogleNet - I am curious if you will have the same issue? (Note that this app will send to cknowledge.org/repo anonymous info about your platform and classification time).
The thing is that scenarios for this app are prepared in exactly the same way as you did above but using my desktop machine. However, they were prepared a few months ago, so if they work, it's likely that some latest changes in Caffe/ViennaCL/CLBlast cause a problem ...
Thanks again!

@kindloaf
Copy link
Author

I will give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants