Skip to content

Commit

Permalink
update iluvatar retinaNet 1x1 2x8 config (#181)
Browse files Browse the repository at this point in the history
* update iluvatar retinaNet 1x1 2x8 config

* fix retinaNet README info

* add mAP and mem info
  • Loading branch information
forestlee95 authored Aug 8, 2023
1 parent d999763 commit 6e2d4be
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 4 deletions.
32 changes: 28 additions & 4 deletions training/iluvatar/retinanet-pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,33 @@ torchvision.models.resnet.__dict__['model_urls'][
- 依赖软件版本:无


### 运行情况
| 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s) |
| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
| 单机8卡 | config_BI-V100x1x8 | | 0.35 | 0.348 | | |
* 通用指标

| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ----------------------- | ------------------------------------------- |
| 任务类别 | 目标检测 | |
| 模型 | retinanet | |
| 数据集 | COCO2017 | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
| 单卡批尺寸 | bs,见“性能指标” | 即local batch_size |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | BI-V100 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 |
| 总吞吐量 | p_whole,见“性能指标” | 实际训练图片数除以总时间(performance_whole) |
| 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 |
| **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1) |
| 训练结果 | mAP,见“性能指标” | 所有类别的 Average Precision(平均精度)的均值 |
| 额外修改项 || |


* 性能指标

| 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | mAP | mem |
| ------------------- | --------- | ------------- | -------- | ------- | ------- | ------ | ------ | --------- |
| BI-V100单机8卡(1x8) | fp32 | bs=16,lr=0.04 | | | | | 0.349 | 30.9/ 32.0 |
| BI-V100单机单卡(1x1) | fp32 | bs=8,lr=0.02 | | | | | | |
| BI-V100两机8卡(2x8) | fp32 | bs=8,lr=0.02 | | | | | | |


训练精度来源:[torchvision.models — Torchvision 0.8.1 documentation (pytorch.org)](https://pytorch.org/vision/0.8/models.html?highlight=faster#torchvision.models.detection.retinanet_resnet50_fpn)
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
vendor: str = "iluvatar"
train_batch_size = 8
eval_batch_size = 8
lr = 0.02
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
vendor: str = "iluvatar"
train_batch_size = 8
eval_batch_size = 8
lr = 0.02

0 comments on commit 6e2d4be

Please sign in to comment.