Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in Transpose CUDA kernel #7329

Merged
merged 11 commits into from
May 27, 2021
Merged

Fix bug in Transpose CUDA kernel #7329

merged 11 commits into from
May 27, 2021

Conversation

hariharans29
Copy link
Member

Description: Corner case bug -the calculation for block size doesn't account for the fact that the block size can become 0 for some input shapes. Account for this.

Motivation and Context
Resolve #7316

@hariharans29 hariharans29 requested a review from a team as a code owner April 13, 2021 13:50
@hariharans29 hariharans29 merged commit 7380219 into master May 27, 2021
@hariharans29 hariharans29 deleted the hari/transpose_debug branch May 27, 2021 21:01
xzhu1900 pushed a commit that referenced this pull request May 28, 2021
xzhu1900 added a commit that referenced this pull request May 28, 2021
* Fix bug in Transpose CUDA kernel (#7329)

* Fix permission error for ORTModule lock file (#7814)

* fix topo sort in quant tool (#7833)

* fix topo sort in quant tool

* add unit test and make the topo sort stable

* Relax tol for Conv1D fp16 test (#7844)

* Relax tol for Conv1D fp16 test

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Resolve issue with wrapped ORTModule load_state_dict (#7847)

* Encapsulate children modules inside a ModuleAccessor object to prevent erroneuos iteration over children while loading the state dictionary

* Add named_models, models, apply methods, change ModuleAccessor to ModuleMetadata and modify unit tests

* Change ModuleMetadata module getter logic, raise NotImplementedError for add_modules

* Add comment explaining why overriding _load_from_state_dict method is needed

* fixed bugs in packed mode and enable pack mode tests in ci (#7848)

* fixed bugs in packed mode and enable pack mode tests in ci

* removed unnecessary space

* pr comments

* pr comments

* disable an average pool test

* try disabling another avg pool

* disable more avg pool tests

* disable maxpool tests

* add environment variable to control default training package's local version (#7849)

* [js] update documents (#7852)

* [js] update documents

* escape double quotes

* update operators.md

* resolve comments

* Support bool type for Pad CPU (#7856)

* Initial commit

* update

* nit

* Include ORT C/C++ API headers in the ORT Mobile AAR package (#7858)

* Add header files of ort c/c++ api to aar package

* Move header file selection to cmake based on EP choice

* fix duplicated node name (#7865)

* Clean up CPU kernel definition for opset 13 Pad (#7867)

Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Sherlock <baihan.huang@gmail.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Tixxx <tix@microsoft.com>
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cudaErrorInvalidConfiguration in FusedBatchNormV3
3 participants