Multi_sum_sq review, AtomicAdd removal #17002

MoisesHer · 2019-12-07T00:52:47Z

Description

Modified multi_sum_sq operator to avoid nondeterministic behavior, which was potentially caused by AtomicAdd operation on GPU kernel.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage: an specific test for multi_sum_sq operator has been included
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Modified GPU implementation of multi_sum_sq operator. The operator is launching several CUDA blocks per tensor. In the previous version, an atomic operation was used to reduce the sumSQ coming from different blocks. In this new version, we instead use a temporal storage space to write the reduction from each block, and we launch a second kernel to reduce those.

tests/python/gpu/test_operator_gpu.py

eric-haibin-lin

MXNet uses CamelCase for functions and snake_case for variables. Would you mind updating the var names below? Thanks. Otherwise looks good to me

tests/python/gpu/test_operator_gpu.py

src/operator/contrib/multi_sum_sq.cu

MoisesHer · 2019-12-09T23:04:53Z

MXNet uses CamelCase for functions and snake_case for variables. Would you mind updating the var names below? Thanks. Otherwise looks good to me

Thanks ! I think I corrected all of them

* Update multi_sum_sq to avoid AtomicAdd * Add specific test for multi_sum_sq * Add a determism test and lint issues * better test for cheching op is deterministic * Follow MXNet letters case format * Reduce dimensions of tensors in the test

* Improve the speed of the pointwise fusion graph pass (#17114) * Debug the long startup time * Optimize backward fusion * Figure out why the fusion pass is called twice * Cleaning * Small optimization * [BUGFIX] Fix trainer param order (#17068) * fix trainer param order * Update trainer.py * Update trainer.py * Update trainer.py * [reproducibility] multi_sum_sq review, AtomicAdd removal (#17002) * Update multi_sum_sq to avoid AtomicAdd * Add specific test for multi_sum_sq * Add a determism test and lint issues * better test for cheching op is deterministic * Follow MXNet letters case format * Reduce dimensions of tensors in the test Co-authored-by: Haibin Lin <linhaibin.eric@gmail.com> Co-authored-by: MoisesHer <50716238+MoisesHer@users.noreply.github.com>

shuo-ouyang · 2021-08-11T09:05:12Z

src/operator/contrib/multi_sum_sq.cu

+
+  if (threadIdx.x == 0) {
+    block_reductions[(start_tensor_id + tensor_loc) * param.max_chunks_per_tensor +
+                    param.block_to_chunk[blockIdx.x]] = final;


Maybe we should change the variable name here? = final specifies that a virtual function cannot be overridden in a derived class.

MoisesHer added 2 commits December 6, 2019 16:37

Update multi_sum_sq to avoid AtomicAdd

ca879aa

Add specific test for multi_sum_sq

8d60931

eric-haibin-lin reviewed Dec 7, 2019

View reviewed changes

tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved

Add a determism test and lint issues

06ad5d6

eric-haibin-lin reviewed Dec 7, 2019

View reviewed changes

tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved

better test for cheching op is deterministic

c1780f9

eric-haibin-lin reviewed Dec 9, 2019

View reviewed changes

tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved

src/operator/contrib/multi_sum_sq.cu Outdated Show resolved Hide resolved

src/operator/contrib/multi_sum_sq.cu Outdated Show resolved Hide resolved

Follow MXNet letters case format

eb85d8c

eric-haibin-lin approved these changes Dec 10, 2019

View reviewed changes

eric-haibin-lin and others added 2 commits December 11, 2019 20:49

Merge remote-tracking branch 'origin/master' into HEAD

58be32f

Reduce dimensions of tensors in the test

4c917d3

eric-haibin-lin merged commit bbdc1c3 into apache:master Dec 14, 2019

eric-haibin-lin mentioned this pull request Dec 15, 2019

Multi-tensor LAMB #16893

Merged

6 tasks

ptrendx mentioned this pull request Dec 20, 2019

Backport #17002, #17068 and #17114 to 1.6 branch #17137

Merged

shuo-ouyang reviewed Aug 11, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi_sum_sq review, AtomicAdd removal #17002

Multi_sum_sq review, AtomicAdd removal #17002

MoisesHer commented Dec 7, 2019 •

edited

Loading

eric-haibin-lin left a comment

MoisesHer commented Dec 9, 2019

shuo-ouyang Aug 11, 2021

Multi_sum_sq review, AtomicAdd removal #17002

Multi_sum_sq review, AtomicAdd removal #17002

Conversation

MoisesHer commented Dec 7, 2019 • edited Loading

Description

Checklist

Essentials

Changes

eric-haibin-lin left a comment

Choose a reason for hiding this comment

MoisesHer commented Dec 9, 2019

shuo-ouyang Aug 11, 2021

Choose a reason for hiding this comment

MoisesHer commented Dec 7, 2019 •

edited

Loading