Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](bangc-ops): replace __bang_atomic_add with __bang_atomic_reduce… #854

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Unireverse
Copy link
Collaborator

…_add for better perf.

Thanks for your contribution and we appreciate it a lot. 🚀🚀

1. Motivation

Replace __bang_atomic_add with __bang_atomic_reduce_add for better performance.

2. Modification

modified: bangc-ops/kernels/carafe/carafe_block.mlu
modified: bangc-ops/kernels/deform_roi_pool/deform_roi_pool_union1.mlu
modified: bangc-ops/kernels/psroipool/psroipool_block.mlu
modified: bangc-ops/kernels/roi_align_rotated/roi_align_rotated_block.mlu
modified: bangc-ops/kernels/roi_crop/roi_crop_block.mlu
modified: bangc-ops/kernels/rotated_feature_align/rotated_feature_align_block.mlu

3. Test Report

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS Accuracy Acceptance Standard.

  • static threshold
    • diff1
      • float32 mlu diff1 <= 1e-5
      • float32 mlu diff1 <= 3e-3
      • float16 mlu diff1 <= 3e-3
    • diff2
      • float32 mlu diff2 <= 1e-5
      • float32 mlu diff2 <= 3e-3
      • float16 mlu diff2 <= 3e-3
    • diff3
      • mlu diff3 == 0
      • mlu diff3_1 == 0
      • mlu diff3_2 == 0
  • dynamic threshold
    • diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
    • diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
    • diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
      • float32, threshold = 1e-5
      • float16, threshold = 1e-3

3.1.2 Operator Scheme checklist

  • Supported hardware
    • MLU370
    • MLU590
  • Job types
    • BLOCK
    • UNION1
    • UNION2
    • UNION4
    • The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

Regression test all passed.

[ OK ] copy/TestSuite.mluOp/3 (3 ms)
[----------] 4 tests from copy/TestSuite (14 ms total)

[----------] Global test environment tear-down
[ SUMMARY ] Total 104 cases of 2 op(s).
ALL PASSED.
[==========] 104 test cases from 2 test suites ran. (165101 ms total)
[ PASSED ] 104 test cases.

3.2.2 Parameter Check

No update.

3.3 Performance Test

3.4 Summary Analysis

Replace __bang_atomic_add with __bang_atomic_reduce_add for better performance. According to accuracy test, accuracy is not changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants