Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lx add profile #8889

Merged
merged 30 commits into from
Aug 21, 2022
Merged

Lx add profile #8889

merged 30 commits into from
Aug 21, 2022

Conversation

laoliu97
Copy link
Contributor

@laoliu97 laoliu97 commented Aug 9, 2022

为oneflow/python/oneflow/test/modules下的文件
test_abs.py 和test_activate.py 中的算子增加性能测试profile

@laoliu97 laoliu97 requested review from doombeaker and removed request for jackalcooper and BBuf August 9, 2022 09:48
@laoliu97 laoliu97 added test api and removed test labels Aug 10, 2022
@laoliu97 laoliu97 removed the api label Aug 10, 2022
python/oneflow/test/modules/test_activation.py Outdated Show resolved Hide resolved
python/oneflow/test/modules/test_activation.py Outdated Show resolved Hide resolved
python/oneflow/test/modules/test_activation.py Outdated Show resolved Hide resolved
python/oneflow/test/modules/test_activation.py Outdated Show resolved Hide resolved
@laoliu97
Copy link
Contributor Author

test_activation.py性能测试效果图

image
image
image

@laoliu97
Copy link
Contributor Author

test_abs.py性能测试效果图

image

result_list.append(table_line)

doc_test_ratio = cnt0 * 1.0 / len(api_list)
compatiable_completeness_test_ratio = cnt1 * 1.0 / len(api_list)
exception_test_ratio = cnt2 * 1.0 / len(api_list)
performance_test_ratio = cnt3 * 1.0 / len(api_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 里 / 就是浮点除法,不需要 * 1.0,方便的话可以把这四行都改一下

)

cnt0 = 0 # the number of doc_test
cnt1 = 0 # the number of compatiable_completeness_test
cnt2 = 0 # the number of exception_test

cnt3 = 0 # the number of profile_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行之后应该空一行

result_profile_func_list = []
for file in files:
if file!="log" and not os.path.isdir(file) and file.find("__pycache__") == -1:
f = open(path + "/" + file)
Copy link
Contributor

@daquexian daquexian Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用这种直接字符串拼接的方式,这种方式在 windows 下就不工作了(windows 不用 "/" 而是用 "\")。请改用 os.path.join 来拼接,可以顺便把另一处已有的代码也改了。如果有兴趣也可以了解下更现代、更方便的 pathlib

@laoliu97
Copy link
Contributor Author

更新 gen_ops_process.py

描述:根据重构后的rst文件特性更新接口搜索脚本gen_ops_process.py
(注:附带修复部分不规范的rst文档)

生成的md文件预览的表头表尾图下两图所示

image
image

@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12851.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14358.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.6ms / 128.5ms)

OneFlow resnet50 time: 75.3ms (= 7531.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.5ms (= 8747.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.16 (= 87.5ms / 75.3ms)

OneFlow resnet50 time: 48.8ms (= 9766.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.3ms (= 11853.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 59.3ms / 48.8ms)

OneFlow resnet50 time: 36.4ms (= 7276.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.8ms (= 8553.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 42.8ms / 36.4ms)

OneFlow resnet50 time: 28.2ms (= 5643.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8483.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.50 (= 42.4ms / 28.2ms)

OneFlow swin dataloader time: 0.255s (= 51.070s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.256s / 200, num_workers=1)
Relative speed: 0.592 (= 0.151s / 0.255s)

OneFlow swin dataloader time: 0.072s (= 14.340s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.367s / 200, num_workers=4)
Relative speed: 0.583 (= 0.042s / 0.072s)

OneFlow swin dataloader time: 0.043s (= 8.580s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.472s / 200, num_workers=8)
Relative speed: 0.521 (= 0.022s / 0.043s)

❌ OneFlow resnet50 time: 136.7ms (= 13665.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.0ms (= 16201.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.0ms / 136.7ms)

OneFlow resnet50 time: 85.5ms (= 8550.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10264.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.6ms / 85.5ms)

OneFlow resnet50 time: 58.0ms (= 11599.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15686.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.4ms / 58.0ms)

OneFlow resnet50 time: 45.3ms (= 9050.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.2ms (= 16240.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 81.2ms / 45.3ms)

OneFlow resnet50 time: 38.8ms (= 7757.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13635.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 68.2ms / 38.8ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/

@github-actions
Copy link
Contributor

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12850.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14358.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.6ms / 128.5ms)

OneFlow resnet50 time: 75.3ms (= 7530.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8583.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.8ms / 75.3ms)

OneFlow resnet50 time: 48.9ms (= 9775.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.2ms (= 12041.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 60.2ms / 48.9ms)

OneFlow resnet50 time: 36.4ms (= 7274.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.0ms (= 8598.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 43.0ms / 36.4ms)

OneFlow resnet50 time: 28.3ms (= 5663.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.9ms (= 7386.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.30 (= 36.9ms / 28.3ms)

OneFlow swin dataloader time: 0.269s (= 53.752s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.899s / 200, num_workers=1)
Relative speed: 0.556 (= 0.149s / 0.269s)

OneFlow swin dataloader time: 0.072s (= 14.492s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.997s / 200, num_workers=4)
Relative speed: 0.552 (= 0.040s / 0.072s)

OneFlow swin dataloader time: 0.040s (= 7.912s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.469s / 200, num_workers=8)
Relative speed: 0.565 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 136.7ms (= 13670.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.4ms (= 16143.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.4ms / 136.7ms)

OneFlow resnet50 time: 85.4ms (= 8541.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.8ms (= 10281.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.8ms / 85.4ms)

OneFlow resnet50 time: 58.8ms (= 11767.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.2ms (= 15635.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 78.2ms / 58.8ms)

OneFlow resnet50 time: 45.4ms (= 9078.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.8ms (= 15965.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 79.8ms / 45.4ms)

OneFlow resnet50 time: 39.3ms (= 7861.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.4ms (= 13071.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 65.4ms / 39.3ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.3ms (= 12832.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.3ms (= 14327.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.3ms / 128.3ms)

OneFlow resnet50 time: 75.3ms (= 7528.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 81.9ms (= 8192.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 81.9ms / 75.3ms)

OneFlow resnet50 time: 49.0ms (= 9797.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.1ms (= 12227.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.25 (= 61.1ms / 49.0ms)

OneFlow resnet50 time: 36.3ms (= 7254.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.3ms (= 9258.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.28 (= 46.3ms / 36.3ms)

OneFlow resnet50 time: 28.5ms (= 5692.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.3ms (= 7062.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.24 (= 35.3ms / 28.5ms)

OneFlow swin dataloader time: 0.267s (= 53.488s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.342s / 200, num_workers=1)
Relative speed: 0.567 (= 0.152s / 0.267s)

OneFlow swin dataloader time: 0.070s (= 13.936s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.420s / 200, num_workers=4)
Relative speed: 0.604 (= 0.042s / 0.070s)

OneFlow swin dataloader time: 0.040s (= 7.977s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.321s / 200, num_workers=8)
Relative speed: 0.542 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 136.7ms (= 13669.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16260.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.6ms / 136.7ms)

OneFlow resnet50 time: 84.9ms (= 8487.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 110.0ms (= 10996.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 110.0ms / 84.9ms)

OneFlow resnet50 time: 58.6ms (= 11719.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.9ms (= 15783.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.9ms / 58.6ms)

OneFlow resnet50 time: 45.8ms (= 9161.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.9ms (= 15389.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 76.9ms / 45.8ms)

OneFlow resnet50 time: 38.7ms (= 7747.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13611.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 68.1ms / 38.7ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.5ms (= 12850.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.6ms (= 14164.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.6ms / 128.5ms)

OneFlow resnet50 time: 75.5ms (= 7553.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 88.1ms (= 8808.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.17 (= 88.1ms / 75.5ms)

OneFlow resnet50 time: 49.3ms (= 9863.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.8ms (= 11167.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.13 (= 55.8ms / 49.3ms)

OneFlow resnet50 time: 36.6ms (= 7313.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.5ms (= 9496.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.30 (= 47.5ms / 36.6ms)

OneFlow resnet50 time: 28.6ms (= 5720.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.7ms (= 7544.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.32 (= 37.7ms / 28.6ms)

OneFlow swin dataloader time: 0.272s (= 54.329s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.049s / 200, num_workers=1)
Relative speed: 0.553 (= 0.150s / 0.272s)

OneFlow swin dataloader time: 0.070s (= 13.959s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.264s / 200, num_workers=4)
Relative speed: 0.592 (= 0.041s / 0.070s)

OneFlow swin dataloader time: 0.038s (= 7.646s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.513s / 200, num_workers=8)
Relative speed: 0.590 (= 0.023s / 0.038s)

❌ OneFlow resnet50 time: 136.8ms (= 13681.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.1ms (= 16407.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 164.1ms / 136.8ms)

OneFlow resnet50 time: 85.5ms (= 8551.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.9ms (= 11286.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 112.9ms / 85.5ms)

OneFlow resnet50 time: 58.7ms (= 11747.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 89.0ms (= 17790.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 89.0ms / 58.7ms)

OneFlow resnet50 time: 45.8ms (= 9155.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.2ms (= 14030.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 70.2ms / 45.8ms)

OneFlow resnet50 time: 39.3ms (= 7852.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.6ms (= 15110.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.92 (= 75.6ms / 39.3ms)

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 21, 2022 05:47
@mergify mergify bot merged commit e4f6666 into master Aug 21, 2022
@mergify mergify bot deleted the lx_add_profile branch August 21, 2022 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants