Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen] Support transform.oneflow.apply_patterns Op in MLIR #10255

Merged
merged 12 commits into from
May 12, 2023

Conversation

howin98
Copy link
Contributor

@howin98 howin98 commented May 11, 2023

iree中通过 transform.iree.apply_patterns 这个op灵活的将预定义的一些pattern在transform dialect的变换环节作用于指定的op。如,在transform.sequeue中做完tiling之后,做一些规范化的操作:

  transform.structured.tile_to_forall_op %parallel_linalg_ops num_threads [1, 4, 32]
    ( mapping = [#gpu.thread<z>, #gpu.thread<y>, #gpu.thread<x>] )

  // Canonicalizations.
  transform.iree.apply_patterns %variant_op
    { canonicalization, tiling_canonicalization, licm, cse } : (!pdl.operation) -> ()

该pr中在oneflow::transform_dialect命名空间引入类似的op,并完成一个canonicalization的支持。具体实例见:https://github.com/Oneflow-Inc/oneflow/pull/10255/files#diff-a36168c1a81d37cf7f56ff09c02af22e9e3bcc39902ccdb8c7f21805d309e72fR1

除此之外,该pr中对transform dialect interpret等从llvm主仓库迁移过来的设施进行剪裁和重构。

@howin98 howin98 marked this pull request as ready for review May 11, 2023 07:58
@howin98 howin98 requested a review from oneflow-ci-bot May 11, 2023 07:58
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10255/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3090 

❌ OneFlow resnet50 time: 43.0ms (= 4298.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.2ms (= 5716.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 57.2ms / 43.0ms)

OneFlow resnet50 time: 26.2ms (= 2619.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.1ms (= 3709.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.42 (= 37.1ms / 26.2ms)

OneFlow resnet50 time: 19.7ms (= 3931.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.6ms (= 7128.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.81 (= 35.6ms / 19.7ms)

OneFlow resnet50 time: 18.1ms (= 3620.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.2ms (= 6241.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.72 (= 31.2ms / 18.1ms)

OneFlow resnet50 time: 17.9ms (= 3580.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.1ms (= 5828.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.63 (= 29.1ms / 17.9ms)

OneFlow swin dataloader time: 0.202s (= 40.411s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.758s / 200, num_workers=1)
Relative speed: 0.637 (= 0.129s / 0.202s)

OneFlow swin dataloader time: 0.057s (= 11.331s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.590s / 200, num_workers=4)
Relative speed: 0.582 (= 0.033s / 0.057s)

OneFlow swin dataloader time: 0.033s (= 6.542s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.347s / 200, num_workers=8)
Relative speed: 0.512 (= 0.017s / 0.033s)

❌ OneFlow resnet50 time: 48.8ms (= 4879.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.3ms (= 6432.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 64.3ms / 48.8ms)

OneFlow resnet50 time: 37.1ms (= 3711.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.3ms (= 4528.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 45.3ms / 37.1ms)

OneFlow resnet50 time: 28.6ms (= 5726.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.4ms (= 7873.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 39.4ms / 28.6ms)

OneFlow resnet50 time: 25.6ms (= 5120.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.7ms (= 7739.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 38.7ms / 25.6ms)

OneFlow resnet50 time: 24.1ms (= 4822.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.4ms (= 7282.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 36.4ms / 24.1ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-module. PR label automerge has been removed

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10255/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3090 

❌ OneFlow resnet50 time: 43.0ms (= 4302.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.2ms (= 5723.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 57.2ms / 43.0ms)

OneFlow resnet50 time: 26.2ms (= 2621.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3739.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.4ms / 26.2ms)

OneFlow resnet50 time: 18.7ms (= 3748.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.2ms (= 7040.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.88 (= 35.2ms / 18.7ms)

OneFlow resnet50 time: 17.7ms (= 3548.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.1ms (= 6425.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.81 (= 32.1ms / 17.7ms)

OneFlow resnet50 time: 16.7ms (= 3344.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.3ms (= 5863.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.75 (= 29.3ms / 16.7ms)

OneFlow swin dataloader time: 0.199s (= 39.724s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.891s / 200, num_workers=1)
Relative speed: 0.652 (= 0.129s / 0.199s)

OneFlow swin dataloader time: 0.054s (= 10.806s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.576s / 200, num_workers=4)
Relative speed: 0.609 (= 0.033s / 0.054s)

OneFlow swin dataloader time: 0.031s (= 6.284s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.376s / 200, num_workers=8)
Relative speed: 0.537 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 48.6ms (= 4859.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.2ms (= 6617.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 66.2ms / 48.6ms)

OneFlow resnet50 time: 36.4ms (= 3644.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.5ms (= 4546.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 45.5ms / 36.4ms)

OneFlow resnet50 time: 28.2ms (= 5644.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.8ms (= 7753.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 38.8ms / 28.2ms)

OneFlow resnet50 time: 25.9ms (= 5182.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.6ms (= 7728.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 38.6ms / 25.9ms)

OneFlow resnet50 time: 24.7ms (= 4930.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.0ms (= 7198.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.46 (= 36.0ms / 24.7ms)

@howin98 howin98 enabled auto-merge (squash) May 11, 2023 12:20
@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10255/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3090 

❌ OneFlow resnet50 time: 43.2ms (= 4323.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 64.8ms (= 6483.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.50 (= 64.8ms / 43.2ms)

OneFlow resnet50 time: 26.3ms (= 2627.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 44.3ms (= 4431.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.69 (= 44.3ms / 26.3ms)

OneFlow resnet50 time: 18.4ms (= 3688.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 38.6ms (= 7720.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 2.09 (= 38.6ms / 18.4ms)

OneFlow resnet50 time: 16.9ms (= 3375.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 34.7ms (= 6932.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 2.05 (= 34.7ms / 16.9ms)

OneFlow resnet50 time: 15.6ms (= 3128.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.3ms (= 5657.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.81 (= 28.3ms / 15.6ms)

OneFlow swin dataloader time: 0.202s (= 40.344s / 200, num_workers=1)
PyTorch swin dataloader time: 0.131s (= 26.125s / 200, num_workers=1)
Relative speed: 0.648 (= 0.131s / 0.202s)

OneFlow swin dataloader time: 0.056s (= 11.199s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.557s / 200, num_workers=4)
Relative speed: 0.585 (= 0.033s / 0.056s)

OneFlow swin dataloader time: 0.032s (= 6.374s / 200, num_workers=8)
PyTorch swin dataloader time: 0.016s (= 3.294s / 200, num_workers=8)
Relative speed: 0.517 (= 0.016s / 0.032s)

❌ OneFlow resnet50 time: 48.6ms (= 4862.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.3ms (= 6828.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 68.3ms / 48.6ms)

OneFlow resnet50 time: 37.5ms (= 3749.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.8ms (= 4476.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 44.8ms / 37.5ms)

OneFlow resnet50 time: 28.9ms (= 5771.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.5ms (= 7699.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 38.5ms / 28.9ms)

OneFlow resnet50 time: 25.5ms (= 5090.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.6ms (= 7715.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 38.6ms / 25.5ms)

OneFlow resnet50 time: 23.7ms (= 4742.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.9ms (= 7177.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 35.9ms / 23.7ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10255/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.7ms (= 4370.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 61.6ms (= 6160.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.41 (= 61.6ms / 43.7ms)

OneFlow resnet50 time: 26.2ms (= 2622.6ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.8ms (= 3775.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.44 (= 37.8ms / 26.2ms)

OneFlow resnet50 time: 18.9ms (= 3786.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.0ms (= 7009.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.85 (= 35.0ms / 18.9ms)

OneFlow resnet50 time: 18.9ms (= 3773.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 33.9ms (= 6777.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.80 (= 33.9ms / 18.9ms)

OneFlow resnet50 time: 17.4ms (= 3484.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.2ms (= 5645.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.62 (= 28.2ms / 17.4ms)

OneFlow swin dataloader time: 0.203s (= 40.578s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.798s / 200, num_workers=1)
Relative speed: 0.636 (= 0.129s / 0.203s)

OneFlow swin dataloader time: 0.054s (= 10.770s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.572s / 200, num_workers=4)
Relative speed: 0.610 (= 0.033s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.067s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.341s / 200, num_workers=8)
Relative speed: 0.551 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.6ms (= 4762.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.2ms (= 6518.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 65.2ms / 47.6ms)

OneFlow resnet50 time: 30.9ms (= 3087.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 43.8ms (= 4383.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.42 (= 43.8ms / 30.9ms)

OneFlow resnet50 time: 24.2ms (= 4830.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.5ms (= 8093.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 40.5ms / 24.2ms)

OneFlow resnet50 time: 22.2ms (= 4437.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.8ms (= 7355.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 36.8ms / 22.2ms)

OneFlow resnet50 time: 21.1ms (= 4228.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.1ms (= 6825.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 34.1ms / 21.1ms)

@howin98 howin98 merged commit 1af9915 into master May 12, 2023
@howin98 howin98 deleted the support-transform-apply-patterns branch May 12, 2023 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants