update iluvatar retinaNet 1x1 2x8 config (#181)

* update iluvatar retinaNet 1x1 2x8 config * fix retinaNet README info * add mAP and mem info
FlagOpen · Aug 8, 2023 · 6e2d4be · 6e2d4be
1 parent d999763
commit 6e2d4be
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 4 deletions.
diff --git a/training/iluvatar/retinanet-pytorch/README.md b/training/iluvatar/retinanet-pytorch/README.md
@@ -27,9 +27,33 @@ torchvision.models.resnet.__dict__['model_urls'][
    - 依赖软件版本：无
 
 
-### 运行情况
-| 训练资源 | 配置文件        | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能（samples/s) |
-| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
-| 单机8卡  | config_BI-V100x1x8  |              |    0.35      |   0.348      |         |                  |
+* 通用指标
+
+| 指标名称       | 指标值                  | 特殊说明                                    |
+| -------------- | ----------------------- | ------------------------------------------- |
+| 任务类别       | 目标检测                |                                             |
+| 模型           | retinanet               |                                             |
+| 数据集         | COCO2017                |                                             |
+| 数据精度       | precision,见“性能指标”  | 可选fp32/amp/fp16                           |
+| 单卡批尺寸     | bs,见“性能指标”         | 即local batch_size                          |
+| 超参修改       | fix_hp,见“性能指标”     | 跑满硬件设备评测吞吐量所需特殊超参          |
+| 硬件设备简称   | BI-V100             |                                             |
+| 硬件存储使用   | mem,见“性能指标”        | 通常称为“显存”,单位为GiB                    |
+| 端到端时间     | e2e_time,见“性能指标”   | 总时间+Perf初始化等时间                     |
+| 总吞吐量       | p_whole,见“性能指标”    | 实际训练图片数除以总时间(performance_whole) |
+| 训练吞吐量     | p_train,见“性能指标”    | 不包含每个epoch末尾的评估部分耗时           |
+| **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1)            |
+| 训练结果       | mAP,见“性能指标”        | 所有类别的 Average Precision（平均精度）的均值                  |
+| 额外修改项     | 无                      |                                             |
+
+
+* 性能指标
+
+| 配置                | precision | fix_hp        | e2e_time | p_whole | p_train | p_core | mAP    | mem       |
+| ------------------- | --------- | ------------- | -------- | ------- | ------- | ------ | ------ | --------- |
+| BI-V100单机8卡（1x8）  | fp32      | bs=16,lr=0.04 |    |      |      |     | 0.349 | 30.9/ 32.0 |
+| BI-V100单机单卡（1x1） | fp32     | bs=8,lr=0.02 |         |         |      |        |   |       |
+| BI-V100两机8卡（2x8）  | fp32      | bs=8,lr=0.02 |      |       |         |        |        |         |
+
 
 训练精度来源：[torchvision.models — Torchvision 0.8.1 documentation (pytorch.org)](https://pytorch.org/vision/0.8/models.html?highlight=faster#torchvision.models.detection.retinanet_resnet50_fpn)
diff --git a/training/iluvatar/retinanet-pytorch/config/config_BI-V100x1x1.py b/training/iluvatar/retinanet-pytorch/config/config_BI-V100x1x1.py
@@ -0,0 +1,4 @@
+vendor: str = "iluvatar"
+train_batch_size = 8
+eval_batch_size = 8
+lr = 0.02
diff --git a/training/iluvatar/retinanet-pytorch/config/config_BI-V100x2x8.py b/training/iluvatar/retinanet-pytorch/config/config_BI-V100x2x8.py
@@ -0,0 +1,4 @@
+vendor: str = "iluvatar"
+train_batch_size = 8
+eval_batch_size = 8
+lr = 0.02