add Iluvatar retinanet case. (#173)

* add iluvatar retinanet case * update README * update iluvatar retinanet config and README --------- Co-authored-by: uuup <55571217+upvenly@users.noreply.github.com>
FlagOpen · Aug 1, 2023 · edd64f1 · edd64f1
1 parent d5c3a3a
commit edd64f1
Show file tree

Hide file tree

Showing 8 changed files with 54 additions and 3 deletions.
diff --git a/training/benchmarks/retinanet/README.md b/training/benchmarks/retinanet/README.md
@@ -61,6 +61,6 @@ torchvision.models.resnet.__dict__['model_urls'][
 | ---------- | ------- |
 | Nvidia GPU | ✅       |
 | 昆仑芯 XPU | N/A     |
-| 天数智芯   | N/A     |
+| 天数智芯   |  ✅    |
 
 
diff --git a/training/benchmarks/retinanet/pytorch/config/_base.py b/training/benchmarks/retinanet/pytorch/config/_base.py
@@ -67,5 +67,7 @@
 sync_bn: bool = False
 gradient_accumulation_steps: int = 1
 
+cudnn_benchmark: bool = True
+cudnn_deterministic: bool = False
 
-pretrained_path = "resnet50-0676ba61.pth"
+pretrained_path = "resnet50-0676ba61.pth"
diff --git a/training/benchmarks/retinanet/pytorch/config/mutable_params.py b/training/benchmarks/retinanet/pytorch/config/mutable_params.py
@@ -4,5 +4,6 @@
     'do_train', 'fp16', 'distributed', 'warmup', 'dist_backend', 'num_workers',
     'device',
     'cudnn_benchmark',
-    'cudnn_deterministic'
+    'cudnn_deterministic',
+    'local_rank'
 ]
diff --git a/training/iluvatar/retinanet-pytorch/README.md b/training/iluvatar/retinanet-pytorch/README.md
@@ -0,0 +1,35 @@
+### 模型backbone权重下载
+[模型backbone权重下载](https://download.pytorch.org/models/resnet50-0676ba61.pth)
+
+这一部分路径在FlagPerf/training/benchmarks/retinanet/pytorch/model/\_\_init__.py中提供：
+
+```python
+torchvision.models.resnet.__dict__['model_urls'][
+    'resnet50'] = 'https://download.pytorch.org/models/resnet50-0676ba61.pth'
+```
+本case中默认配置为，从官网同路径（0676ba61）自动下载backbone权重。用户如需手动指定，可自行下载至被挂载到容器内的路径下，并于此处修改路径为"file://"+download_path
+
+### 测试数据集下载
+
+[测试数据集下载](https://cocodataset.org/)
+
+### 天数智芯 BI-V100 GPU配置与运行信息参考
+#### 环境配置
+- ##### 硬件环境
+    - 机器、加速卡型号: Iluvatar BI-V100 32GB
+
+- ##### 软件环境
+   - OS版本：Ubuntu 20.04
+   - OS kernel版本:  4.15.0-156-generic x86_64    
+   - 加速卡驱动版本：3.1.0
+   - Docker 版本：20.10.8
+   - 训练框架版本：torch-1.13.1+corex.3.1.0
+   - 依赖软件版本：无
+
+
+### 运行情况
+| 训练资源 | 配置文件        | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能（samples/s) |
+| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
+| 单机8卡  | config_BI-V100x1x8  |              |    0.35      |   0.348      |         |                  |
+
+训练精度来源：[torchvision.models — Torchvision 0.8.1 documentation (pytorch.org)](https://pytorch.org/vision/0.8/models.html?highlight=faster#torchvision.models.detection.retinanet_resnet50_fpn)
diff --git a/training/iluvatar/retinanet-pytorch/config/config_BI-V100x1x8.py b/training/iluvatar/retinanet-pytorch/config/config_BI-V100x1x8.py
@@ -0,0 +1,4 @@
+vendor: str = "iluvatar"
+train_batch_size = 16
+eval_batch_size = 16
+lr = 0.04
diff --git a/training/iluvatar/retinanet-pytorch/config/environment_variables.sh b/training/iluvatar/retinanet-pytorch/config/environment_variables.sh
@@ -0,0 +1,6 @@
+# =================================================
+# Export variables
+# =================================================
+
+
+export OMP_NUM_THREADS=1
diff --git a/training/iluvatar/retinanet-pytorch/config/requirements.txt b/training/iluvatar/retinanet-pytorch/config/requirements.txt
@@ -0,0 +1,3 @@
+opencv-python>=4.1.1
+opencv-python-headless==4.1.2.30
+pycocotools>=2.0
diff --git a/training/iluvatar/retinanet-pytorch/extern/.gitkeep b/training/iluvatar/retinanet-pytorch/extern/.gitkeep