Replies: 3 comments
-
I though about EdgeTPU for a while now, but I can't support HW that I don't have available for testing Regarding SOEdge, it sounds easy enough given that their Actuity is built on top of TF to start with, so conversion should be straight forward. But looking at HW specs, I doubt performance improvement would be 100x, more likely 5-10x at best Also when combining it with RPi, the issue can easily become one of I/O bottleneck (IMO, this is due to fact that RPi platform does not have a dedicated I/O controller, but instead uses CPU for data management) - RPi platform is pretty bad on that side, so how do you keep SOEdge saturated (e.g, if you need to pass off a tensor-equivalent of 640x480image at 25FPS to SOEdge, that is ~25MB/sec which is already enough to cause issues on RPi3 (ok on RPi4). And that is low resolution and far from high FPS For that reason, I'm more interested in tightly coupled accelerators like nVidia Jetson Nano (although that one is pretty old nowadays and there hasn't been a refresh in a while) But then there is a question which exact ML kernel ops are supported by the backend - e.g, Another issue is that a lot of low-end accelerators are notoriously bad with FP32 precision, their best acceleration comes in INT32 land All-in-all, I'm in wait-and-see game when it comes to SOC solutions - waiting for them to mature a bit |
Beta Was this translation helpful? Give feedback.
-
Yes I'm in the same situation, namely wait&see, but I admit I'm growing increasingly curious. For prototyping I can use RPi3, RPi4, PinePhone or my desktop but I admit having a dedicated NPU in order to better understand the workflow could be nice. I keep on seeing new setups arriving, like the SOEdge, so I'm wondering where the opportunity could be. I imagine mostly for privacy sensitive (so no iOS/Android mobile setups) with real-time requirement and a small footprint. I was also considering the Jetson Nano so that would indeed be an easier start. I'll share some feedback then, hopefully I won't have to resort to shenanigans like https://enricopiccini.com/en/kb/HOW_TO_RUN_TensorflowJs_in_NodeJs_on_NVidia_Jetson_Nano_arm64_-658 |
Beta Was this translation helpful? Give feedback.
-
FYI, building TF from scratch using Bazel is not that complicated, but it's extremely CPU and memory intensive and will run into many issues unless you edit bazel configuration to work in your environment. And then it's going to take a while (on RPi, more than a day) Btw, I used to use RPi4 as my home server, but it's slow I/O was a constant issue for me plus the constant need to rebuild stuff due to lack of ARM64 support. At the end, I switched to using x86 architecture Which brings me to another topic, E.g., my nVidia GPU with 6GB VRAM does frequently run into OOM issues and there is no chance I can run with batch numbers higher than 1 or concurrent models (must serialize everything) Which means that for my purposes, smallest Edge TPU that would fit my needs is nVidia Jetson TX2 - and it's just easier to stay with my desktop |
Beta Was this translation helpful? Give feedback.
-
I'm getting some limited performance on limited hardware (e.g 1fps on a RPi3 for BlazePose) which is expected.
I start to see more and more boards dedicated to AI/ML with not just CPU or GPU but NPU. For example I'm considering the SOEdge by Pine64 https://wiki.pine64.org/wiki/SOEdge and noticed linked to specific models https://verisilicon.github.io/acuity-models/ for this architecture.
Assuming that it would result in radically better performance, e.g. 100x, could it be interesting to support this workflow?
Beta Was this translation helpful? Give feedback.
All reactions