Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU mode does not speed up, actually, even worse than CPU mode... #18

Closed
baileyqbb opened this issue Oct 25, 2017 · 5 comments
Closed

Comments

@baileyqbb
Copy link

Platform: Rockchip RK3399.

I have setup Ubuntu 16.04 environment with other stuff needed for CaffeOnACL, by following the installation instructions.

While trying to test the classification using ./distribute/bin/classification_profiling_gpu.bin, the processing time is much longer than using ./distribute/bin/classification_profiling.bin.
e.g.
googlenet--> cpu: ~0.54s GPU: ~6s
mobilenet--> cpu: ~0.42s GPU:~2.6s
squeezenet--> cpu: ~0.16s GPU:~2.9s

I have updated the mali libraries from https://github.com/rockchip-linux/libmali and all the unit tests passed.

Checked /dev/mali0, it exists.

Any ideas why this happened?

@anwesha94
Copy link

Hi, I have also run on both cpu and gpu. GPU performance is worse than CPU always.
Refer to the performance report CaffeOnACL has provided: https://github.com/OAID/CaffeOnACL/blob/master/acl_openailab/performance_report.pdf

@baileyqbb
Copy link
Author

@anwesha94 Thanks for your kind reminder! Indeed the GPU mode is slower than CPU mode. Then may I understand in this way that GPU here is unnecessary, since in the mix mode, only OpenBlas and NEON are used to accelerate the processing? Will there be better performance if using a processor with more cores, say 8? (rk3399 has 6 cores in total)

@daeinki
Copy link

daeinki commented Nov 29, 2017

I also measured the CaffeOnACL performance and compared CPU and GPU performance.
For the performance measurement, I used image classification applications - classification_profiling_gpu.bin and classification_profiling.bin - with AlexNet and SqueezeNet.

As a result, GPU is slower than CPU in all cases about 3 times!!!.
Test env: Galaxy Note4(Exynos5433 with MALI-T760).

There are also two strange results.

  1. CaffeOnACL is slower than original Caffe about 2 times!!!
  2. with SqueezeNet, performance no change!!!
  • with CPU mode SqueezeNet shows about 2 times faster than AlexNet
    but with GPU mode there is no performance change.

@baileyqbb
Copy link
Author

I am guessing one of the reasons that the GPU mode is much slower than CPU mode is that the MALI-GPU's ram is not big enough to contain the whole model for the processing. Thus, it needs to copy partial of the model from cpu to gpu and then back to cpu several times for one forward processing, which will cost more time than running on multi-core cpu.

@daeinki
Copy link

daeinki commented Dec 20, 2017

Thanks for reply. However, MALI GPU has no internal memory so it uses system memory instead. And Exynos5433 SoC has 3GB system memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants