Skip to content

Commit ea593dd

Browse files
authored
Refactor benchmark (#148)
* use mean as default for benchmark metric; change result representation; add --all for benchmarking all configs at a time * fix comments * add --model_exclude * pretty print * improve benchmark result table header: from band-xpu to xpu-band * suppress print message * update benchmark results on CPU-RPI * add the new benchmark results on the new intel cpu * fix backend and target setting in benchmark; pre-modify the names of int8 quantized models * add results on jetson cpu * add cuda results * print target and backend when using --all * add results on Khadas VIM3 * pretty print results * true pretty print results * update results in new format * fix broken backend and target vars * fix broken backend and target vars * fix broken backend and target var * update benchmark results on many devices * add db results on Ascend-310 * update info on CPU-INTEL * update usage of the new benchmark script
1 parent bdb6f66 commit ea593dd

28 files changed

+554
-127
lines changed

README.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -21,43 +21,43 @@ Guidelines:
2121

2222
## Models & Benchmark Results
2323

24-
| Model | Task | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | KV3-NPU (ms) | Ascend-310 (ms) | D1-CPU (ms) |
25-
| ------------------------------------------------------- | ----------------------------- | ---------- | -------------- | ------------ | --------------- | ------------ | --------------- | ----------- |
26-
| [YuNet](./models/face_detection_yunet) | Face Detection | 160x120 | 1.45 | 6.22 | 12.18 | 4.04 | 1.73 | 86.69 |
27-
| [SFace](./models/face_recognition_sface) | Face Recognition | 112x112 | 8.65 | 99.20 | 24.88 | 46.25 | 23.17 | --- |
28-
| [FER](./models/facial_expression_recognition/) | Facial Expression Recognition | 112x112 | 4.43 | 49.86 | 31.07 | 29.80 | 10.12 | --- |
29-
| [LPD-YuNet](./models/license_plate_detection_yunet/) | License Plate Detection | 320x240 | --- | 168.03 | 56.12 | 29.53 | 8.70 | --- |
30-
| [YOLOX](./models/object_detection_yolox/) | Object Detection | 640x640 | 176.68 | 1496.70 | 388.95 | 420.98 | 29.10 | --- |
31-
| [NanoDet](./models/object_detection_nanodet/) | Object Detection | 416x416 | 157.91 | 220.36 | 64.94 | 116.64 | 35.97 | --- |
32-
| [DB-IC15](./models/text_detection_db) | Text Detection | 640x480 | 142.91 | 2835.91 | 208.41 | --- | 229.74 | --- |
33-
| [DB-TD500](./models/text_detection_db) | Text Detection | 640x480 | 142.91 | 2841.71 | 210.51 | --- | 247.29 | --- |
34-
| [CRNN-EN](./models/text_recognition_crnn) | Text Recognition | 100x32 | 50.21 | 234.32 | 196.15 | 125.30 | 101.03 | --- |
35-
| [CRNN-CN](./models/text_recognition_crnn) | Text Recognition | 100x32 | 73.52 | 322.16 | 239.76 | 166.79 | 136.41 | --- |
36-
| [PP-ResNet](./models/image_classification_ppresnet) | Image Classification | 224x224 | 56.05 | 602.58 | 98.64 | 75.45 | 6.99 | --- |
37-
| [MobileNet-V1](./models/image_classification_mobilenet) | Image Classification | 224x224 | 9.04 | 92.25 | 33.18 | 145.66\* | 5.25 | --- |
38-
| [MobileNet-V2](./models/image_classification_mobilenet) | Image Classification | 224x224 | 8.86 | 74.03 | 31.92 | 146.31\* | 5.82 | --- |
39-
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | Human Segmentation | 192x192 | 19.92 | 105.32 | 67.97 | 74.77 | 7.07 | --- |
40-
| [WeChatQRCode](./models/qrcode_wechatqrcode) | QR Code Detection and Parsing | 100x100 | 7.04 | 37.68 | --- | --- | --- | --- |
41-
| [DaSiamRPN](./models/object_tracking_dasiamrpn) | Object Tracking | 1280x720 | 36.15 | 705.48 | 76.82 | --- | --- | --- |
42-
| [YoutuReID](./models/person_reid_youtureid) | Person Re-Identification | 128x256 | 35.81 | 521.98 | 90.07 | 44.61 | 5.69 | --- |
43-
| [MP-PalmDet](./models/palm_detection_mediapipe) | Palm Detection | 192x192 | 11.09 | 63.79 | 83.20 | 33.81 | 21.59 | --- |
44-
| [MP-HandPose](./models/handpose_estimation_mediapipe) | Hand Pose Estimation | 224x224 | 4.28 | 36.19 | 40.10 | 19.47 | 6.02 | --- |
24+
| Model | Task | Input Size | CPU-INTEL (ms) | CPU-RPI (ms) | GPU-JETSON (ms) | NPU-KV3 (ms) | NPU-Ascend310 (ms) | CPU-D1 (ms) |
25+
| ------------------------------------------------------- | ----------------------------- | ---------- | -------------- | ------------ | --------------- | ------------ | ------------------ | ----------- |
26+
| [YuNet](./models/face_detection_yunet) | Face Detection | 160x120 | 0.72 | 5.43 | 12.18 | 4.04 | 2.24 | 86.69 |
27+
| [SFace](./models/face_recognition_sface) | Face Recognition | 112x112 | 6.04 | 78.83 | 24.88 | 46.25 | 2.66 | --- |
28+
| [FER](./models/facial_expression_recognition/) | Facial Expression Recognition | 112x112 | 3.16 | 32.53 | 31.07 | 29.80 | 2.19 | --- |
29+
| [LPD-YuNet](./models/license_plate_detection_yunet/) | License Plate Detection | 320x240 | 8.63 | 167.70 | 56.12 | 29.53 | 7.63 | --- |
30+
| [YOLOX](./models/object_detection_yolox/) | Object Detection | 640x640 | 141.20 | 1805.87 | 388.95 | 420.98 | 28.59 | --- |
31+
| [NanoDet](./models/object_detection_nanodet/) | Object Detection | 416x416 | 66.03 | 225.10 | 64.94 | 116.64 | 20.62 | --- |
32+
| [DB-IC15](./models/text_detection_db) (EN) | Text Detection | 640x480 | 71.03 | 1862.75 | 208.41 | --- | 17.15 | --- |
33+
| [DB-TD500](./models/text_detection_db) (EN&CN) | Text Detection | 640x480 | 72.31 | 1878.45 | 210.51 | --- | 17.95 | --- |
34+
| [CRNN-EN](./models/text_recognition_crnn) | Text Recognition | 100x32 | 20.16 | 278.11 | 196.15 | 125.30 | --- | --- |
35+
| [CRNN-CN](./models/text_recognition_crnn) | Text Recognition | 100x32 | 23.07 | 297.48 | 239.76 | 166.79 | --- | --- |
36+
| [PP-ResNet](./models/image_classification_ppresnet) | Image Classification | 224x224 | 34.71 | 463.93 | 98.64 | 75.45 | 6.99 | --- |
37+
| [MobileNet-V1](./models/image_classification_mobilenet) | Image Classification | 224x224 | 5.90 | 72.33 | 33.18 | 145.66\* | 5.15 | --- |
38+
| [MobileNet-V2](./models/image_classification_mobilenet) | Image Classification | 224x224 | 5.97 | 66.56 | 31.92 | 146.31\* | 5.41 | --- |
39+
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | Human Segmentation | 192x192 | 8.81 | 73.13 | 67.97 | 74.77 | 6.94 | --- |
40+
| [WeChatQRCode](./models/qrcode_wechatqrcode) | QR Code Detection and Parsing | 100x100 | 1.29 | 5.71 | --- | --- | --- | --- |
41+
| [DaSiamRPN](./models/object_tracking_dasiamrpn) | Object Tracking | 1280x720 | 29.05 | 712.94 | 76.82 | --- | --- | --- |
42+
| [YoutuReID](./models/person_reid_youtureid) | Person Re-Identification | 128x256 | 30.39 | 625.56 | 90.07 | 44.61 | 5.58 | --- |
43+
| [MP-PalmDet](./models/palm_detection_mediapipe) | Palm Detection | 192x192 | 6.29 | 86.83 | 83.20 | 33.81 | 5.17 | --- |
44+
| [MP-HandPose](./models/handpose_estimation_mediapipe) | Hand Pose Estimation | 224x224 | 4.68 | 43.57 | 40.10 | 19.47 | 6.27 | --- |
4545

4646
\*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.
4747

4848
Hardware Setup:
4949

50-
- `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
51-
- `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
52-
- `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
53-
- `KV3-NPU`: [Khadas VIM3](https://www.khadas.com/vim3), 5TOPS Performance. Benchmarks are done using **quantized** models. You will need to compile OpenCV with TIM-VX following [this guide](https://github.com/opencv/opencv/wiki/TIM-VX-Backend-For-Running-OpenCV-On-NPU) to run benchmarks. The test results use the `per-tensor` quantization model by default.
54-
- `Ascend-310`: [Ascend 310](https://e.huawei.com/uk/products/cloud-computing-dc/atlas/ascend-310), 22 TOPS@INT8. Benchmarks are done on [Atlas 200 DK AI Developer Kit](https://e.huawei.com/in/products/cloud-computing-dc/atlas/atlas-200). Get the latest OpenCV source code and build following [this guide](https://github.com/opencv/opencv/wiki/Huawei-CANN-Backend) to enable CANN backend.
55-
- `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
50+
- `CPU-INTEL`: [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html), 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
51+
- `CPU-RPI`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5 GHz.
52+
- `GPU-JETSON`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
53+
- `NPU-KV3`: [Khadas VIM3](https://www.khadas.com/vim3), 5TOPS Performance. Benchmarks are done using **quantized** models. You will need to compile OpenCV with TIM-VX following [this guide](https://github.com/opencv/opencv/wiki/TIM-VX-Backend-For-Running-OpenCV-On-NPU) to run benchmarks. The test results use the `per-tensor` quantization model by default.
54+
- `NPU-Ascend310`: [Ascend 310](https://e.huawei.com/uk/products/cloud-computing-dc/atlas/atlas-200), 22 TOPS @ INT8. Benchmarks are done on [Atlas 200 DK AI Developer Kit](https://e.huawei.com/in/products/cloud-computing-dc/atlas/atlas-200). Get the latest OpenCV source code and build following [this guide](https://github.com/opencv/opencv/wiki/Huawei-CANN-Backend) to enable CANN backend.
55+
- `CPU-D1`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0 GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
5656

5757
***Important Notes***:
5858

5959
- The data under each column of hardware setups on the above table represents the elapsed time of an inference (preprocess, forward and postprocess).
60-
- The time data is the median of 10 runs after some warmup runs. Different metrics may be applied to some specific models.
60+
- The time data is the mean of 10 runs after some warmup runs. Different metrics may be applied to some specific models.
6161
- Batch size is 1 for all benchmark results.
6262
- `---` represents the model is not availble to run on the device.
6363
- View [benchmark/config](./benchmark/config) for more details on benchmarking different models.

0 commit comments

Comments
 (0)