-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference time issue #13
Comments
I'm using a gcc 5 toolchain in the travis tests, which is working fine: Line 46 in 17c2689
If you need gcc 4.9 it should be easy to create one from this recipe: https://github.com/ruslo/polly/blob/master/gcc-5-pic-hid-sections-lto.cmake |
I will try adding libcxx to the CI tests: #14 UPDATE: Clang 3.8 is now building fine in the travis Unbutu Trusty (14.04) image: https://travis-ci.org/elucideye/acf/jobs/287454354ttps://travis-ci.org/elucideye/acf/jobs/287454354 |
TL;DR: The shader implementation is geared towards optimized feature computation on mobile GPUs. The detection itself doesn't map well to simple GLSL processing, so the features must be transfered from GPU->CPU (slow) for CPU based detection (fast). On a desktop, the full process could be executed on the GPU. The console app doesn't currently use the OpenGL ES 2.0 shader acceleration, so I'm sure you are running a CPU only benchmark. I recently migrated this stuff from drishti for general purpose use and improvements, and it will be added to the Hunter package manger once it is cleaned up a little more. I originally needed this for mobile platforms, so OpenGL ES 2.0 was the lowest common denominator that could support both iOS and Android platforms. The main drawback with this approach is the 8 bit channel output limitation (it can be improved with 4x8 -> 32 bit packing). Caveat: Due the the above mentioned limitation, the GLSL output is currently only an approximation of the CPU floating point output, and it needs to be improved (there will be a measurable performance hit). For desktop use, it is probably better to write it in OpenCL or something higher level that doesn't have these limitations. (I recently came across Halide, which seems like an excellent path for cross platform optimization, but I currently have no experience with it.) The GLSL code is all in this file https://github.com/elucideye/acf/blob/master/src/lib/acf/GPUACF.h, which is currently separate from the ACF detection class. To use that class, you will need to manage your own OpenGL context. It uses https://github.com/hunter-packages/ogles_gpgpu to manage a shader pipeline that computes the features. The expensive part on mobile platforms is the GPU->CPU transfer, so one frame of latency is added to the pipeline, such that ACF pyramids can be computed on the GPU for frame N ("for free"), and they are available for processing at time N+1 with no added CPU cost. In this workflow, the precomputed ACF pyramid is passed in for detection in place of the RGB image. The face detection/search on the precomputed pyramids then runs in a few milliseconds on an iPhone 7. For pedestrian detection the extra frame of latency might not be suitable. The SDK call is shown here: Line 392 in 17c2689
There is a small unit test that illustrates what the basic process would look like: 1) compute acf/src/lib/acf/ut/test-acf.cpp Lines 444 to 464 in 17c2689
The above test uses the Hunter That test could be used for some initial benchmarks, and perhaps it could be added to the console application for additional testing. I'll try to take a look in the next few days, unless you want to try it sooner. It would be nice to automate the GPGPU processing at the API level. Actually there was an issue for this elucideye/drishti#373 here. I'll migrate it to the new repository. A |
As a temporary GPU benchmark, I've added a timer class that can be enabled w/ an option in the unit test. The is currently sitting in this PR: #16 See That will print the GPGPU pyramid compute (shaders, read, and "fill" to memory)
As well as the detection time
On my desktop these each take about 2 milliseconds (2+2=4 ms) with a GEFORCE GTX TITAN X. The detection time is comparable on my 2013 MacBook, but the This isn't a proper benchmark, but it can provide some info in the short term.
|
@SoonminHwang : I hope this answers your question. I'm going to close this for now. Since one of the strong advantages of this packages is size + speed, it might make sense to add some targeted google benchmarks. |
Thanks for quick reply!
I tried to change compiler which supports std:regex such as gcc-4.9 or clang-3.5 & libcxx.
But the
polly.py
seems not to support gcc-4.9.(I cannot find gcc-4-9 in the list when I type
polly.py --help
)In the case of libcxx toolchain, I failed to build with some error messages. Here is log file.
Anyway, my first goal is to compare running time to piotr's matlab implementation.
I commented
cxxopts
things inacf.cpp
and measure inference time usinggettimeofday
function.Even though the inference time of the classifier heavily depends on the image content and casc thershold, somthings are wrong.
It takes 54ms for lena512color.png using drishti_face_gray_80x80.cpb.
(As you know, ~100ms in piotr's MATLAB code for 640x480 image)
I expect <1ms with my GPU (Titan X Pascal).
I think, I turn on the flag to use GPU.
acf/CMakeLists.txt
Line 91 in 17c2689
How about the inference time on your machine?
The text was updated successfully, but these errors were encountered: