[Fix](bangc-ops): replace __bang_atomic_add with __bang_atomic_reduce… #854
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…_add for better perf.
Thanks for your contribution and we appreciate it a lot. 🚀🚀
1. Motivation
Replace __bang_atomic_add with __bang_atomic_reduce_add for better performance.
2. Modification
modified: bangc-ops/kernels/carafe/carafe_block.mlu
modified: bangc-ops/kernels/deform_roi_pool/deform_roi_pool_union1.mlu
modified: bangc-ops/kernels/psroipool/psroipool_block.mlu
modified: bangc-ops/kernels/roi_align_rotated/roi_align_rotated_block.mlu
modified: bangc-ops/kernels/roi_crop/roi_crop_block.mlu
modified: bangc-ops/kernels/rotated_feature_align/rotated_feature_align_block.mlu
3. Test Report
3.1 Modification Details
3.1.1 Accuracy Acceptance Standard
For static threshold standard details, see: MLU-OPS Accuracy Acceptance Standard.
3.1.2 Operator Scheme checklist
3.2 Accuracy Test
3.2.1 Accuracy Test
Regression test all passed.
[ OK ] copy/TestSuite.mluOp/3 (3 ms)
[----------] 4 tests from copy/TestSuite (14 ms total)
[----------] Global test environment tear-down
[ SUMMARY ] Total 104 cases of 2 op(s).
ALL PASSED.
[==========] 104 test cases from 2 test suites ran. (165101 ms total)
[ PASSED ] 104 test cases.
3.2.2 Parameter Check
No update.
3.3 Performance Test
3.4 Summary Analysis
Replace __bang_atomic_add with __bang_atomic_reduce_add for better performance. According to accuracy test, accuracy is not changed.