Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tflite only python package #15

Closed
PhilipXue opened this issue Sep 4, 2019 · 19 comments
Closed

tflite only python package #15

PhilipXue opened this issue Sep 4, 2019 · 19 comments

Comments

@PhilipXue
Copy link

Hi, I appreciate a lot for your effort! There are ways to build tflite only python package which will reduce the package size tremendously: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/pip_package
I tied it before and I don't think this runtime can utilize multiprocessing. Is there any chance that you can check this and build a tflite package that has multiprocessing ability?

@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 4, 2019

@PhilipXue
I tried to create it, so please try it. I have not verified the behavior yet. Please let me know if it worked.
https://github.com/PINTO0309/TensorflowLite-bin.git

@PhilipXue
Copy link
Author

@PINTO0309 Wow that's amazing! I'll try it immediately!

@PhilipXue
Copy link
Author

@PINTO0309
Hi,
I tied you the whl you provide today and here are some findings:

  • The Interpretor class doesn't have set_num_threads attribute.
  • but the runtime seems to be using 4 threads
    pi_tflr
  • The speed of running mobilenet ssdlite model is slower than the full-sized tensorflow runtime with 4 threads (650 ms/img vs 600 ms/img), and adding thread with full-sized tensorflow runtime can improve the speed to 520 ms/img.

Does the tflite-runtime package include the improvment you did to tensorflow build?

@PINTO0309
Copy link
Owner

@PhilipXue
I'm sorry. I noticed a mistake in correcting the program. I will fix it today. please wait a moment. I have identified the cause.

@PINTO0309
Copy link
Owner

@PhilipXue
Fixed and recommitted.
tflite_runtime-1.14.0-cp35-cp35m-linux_armv7l.whl
tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl

https://github.com/PINTO0309/TensorflowLite-bin.git

@PhilipXue
Copy link
Author

@PINTO0309
Thanks for your swift response!
I tried out the new whl file and it works like a charm (multithread performance improved over full-sized tf runtime)! Thanks a lot for your effort!
BTW, Do you have any plan to build tflr whls for 64-bit debian buster (python3.7 aarch64)?

@PINTO0309
Copy link
Owner

@PhilipXue
It is complete. The working time is about 5 minutes. However, I don't have a device that can be tested.
tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl
tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

https://github.com/PINTO0309/TensorflowLite-bin/tree/master/1.14.0

@PhilipXue
Copy link
Author

NP!
I'll test and give you feedbacks!

@PhilipXue
Copy link
Author

PhilipXue commented Sep 6, 2019

Hi,
I tested the aarch64 whl on 64-bit debian buster preview image of rpi3, and this time it behaves similarly to the 32-bit version before the fixing:

@PINTO0309
Hi,
I tied you the whl you provide today and here are some findings:

  • The Interpretor class doesn't have set_num_threads attribute.
  • but the runtime seems to be using 4 threads
    pi_tflr
  • The speed of running mobilenet ssdlite model is slower than the full-sized tensorflow runtime with 4 threads (650 ms/img vs 600 ms/img), and adding thread with full-sized tensorflow runtime can improve the speed to 520 ms/img.

Does the tflite-runtime package include the improvment you did to tensorflow build?

  • No set_num_threads attribute:AttributeError: type object 'Interpreter' has no attribute 'set_num_threads'
  • (4 threads) performance is pretty poor (1.3x slower compare to 32-bit version before fix). I'm not sure if this is OS problem since the OS is a preview version.

Where do you think the problem is?

@PINTO0309
Copy link
Owner

@PhilipXue
Ah ... sorry. Perhaps I have repeated the same mistake. I can't work until I get home, so please be patient.

@PhilipXue
Copy link
Author

PhilipXue commented Sep 6, 2019

@PINTO0309
No need to be sorry.
I heard that 64-bit system is better at floating-point calculation and just curious to verify it. Please take your time.

@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 6, 2019

@PhilipXue
I checked the contents of the wheel file. After all, I made the same mistake, so I modified the wheel file.
https://github.com/PINTO0309/TensorflowLite-bin.git
tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl
tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

By the way, there is also a 64-bit kernel OS image that I created independently.
https://github.com/PINTO0309/RaspberryPi-bin.git

The benchmark tool you are using is cool. Is it possible for me to share it?

@PhilipXue
Copy link
Author

@PINTO0309
Hi, I will check when I go back to work on Monday!

The benchmark is very simple, it measures the processing time for each image. Through simple, the code is created for internal projects. You can use code provided in this article to do the exact same thing. (If you are referring the image that I put in the comment, it's htop)
I can share the results measuring some publicly available models (mobilenet ssd/ssdlite), which should be enough to see the improvement made by your whl files.

@PINTO0309
Copy link
Owner

@PhilipXue
Thank you! What I wanted to know was htop.
It's so cool. 😃
Screenshot 2019-09-09 00:21:36

@PhilipXue
Copy link
Author

I'm glad you enjoyed it! If you are interested, there is another similar system monitoring tool called glances that you may want to try out.

Have a nice day!

@PhilipXue
Copy link
Author

@PINTO0309
I've tested the new aarch64 version tflr whl. And the result is not ideal, the same model is far slower than running with 32-bit os or the full-sized 64-bit tensorflow runtime. I verified on another armv8 device and had the same result. Still, this result is improved over the previous version before fixing.
I have no idea why this kind of difference occurs, the best guess I can make is that there are some optimization tensorflow doesn't include in the tflr runtime for aarch64.

@PINTO0309
Copy link
Owner

It is a difficult problem. MultiThread may not be valid, but the official binaries below may be a little faster.
https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

@PhilipXue
Copy link
Author

@PINTO0309 Hi, Thanks for replying. I will it out.
Since all your whls are compiled, I think we can close this issue. Thank you very much for all your swift response.
We may discuss speed problem in a separate thread at TFLite runtime repo.

@krisnadn11
Copy link

krisnadn11 commented Apr 3, 2024

@PhilipXue @PINTO0309 Can you make a complete tutorial from start to finish, for benchmarking tensorflow lite model on raspberry pi. Honestly, I've been stuck here for 1 month, because my knowledge is still narrow about this. I am very thankful and grateful if you are willing to help or at least give advice, because all sources, I think I have tried.

That I have:

  1. detect.tflite
  2. labelmap.txt
  3. Raspberry PI 4B (aarch64, python 3.9)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants