Fix memset size #9840

yuslepukhin · 2021-11-23T02:28:40Z

Description:
cudaMemset size should include size()

Motivation and Context
Make sure the buffer is fully zeroed.

snnn · 2021-11-23T02:46:27Z

onnxruntime/contrib_ops/cuda/inverse.cc

@@ -150,7 +150,7 @@ Status Inverse::ComputeInternal(OpKernelContext* ctx) const {
  }

  IAllocatorUniquePtr<int> info = GetScratchBuffer<int>(num_batches);
-  CUDA_RETURN_IF_ERROR(cudaMemsetAsync(info.get(), 0, num_batches, Stream()));
+  CUDA_RETURN_IF_ERROR(cudaMemsetAsync(info.get(), 0, num_batches * sizeof(int), Stream()));


Just curious, when will this async op be synced ?

It doesn't need explicit syncing right ? When launching operations using a single stream (per-session compute stream : Stream()), all CUDA operations in that stream are serialized and so they execute in the order in which they are queued. Anything queued after this on the same stream will execute only after this operation.

Yes, Hari is correct.

(cherry picked from commit d012d9f)

* Fix memset size (#9840) (cherry picked from commit d012d9f) * [js/web] do not use nodejs type 'Buffer' in web (#9839) * [js/web] do not use nodejs type 'Buffer' in web * resolve comments and validate tests * remove 'Buffer' in test (cherry picked from commit a3ebc5e) * Fix potential data race with OrtValue usage in Python (#9841) (cherry picked from commit 18fd2cf) * [OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848) * Changes to ensure openvino build go through in Windows * Modified Hetero plugin Logic *Modified Hetero Feature logic. In Hetero, if the operator to be marked true in getcapability(), it should be supported by either of the devices specified with HETERO in the device_type. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV updated to 2021.4.2 version * OV updated to 2021.4.2 version * Updated OV to 2021.4.2 version, mono download link and dotnet version * Copying Managed nugets in openvino c# docker file *Copying Managed nuget to nugets artifacts directory Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: saharfraza <sfatima.3001@gmail.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com> (cherry picked from commit 0ae0f29) * no fallback when enforcing explicit EP registration. (#9863) * no fallback when enforcing explicit EP registration. * add explicit ep registrations for python. (cherry picked from commit 1e9e57d) * layernorm throw error if input has no data (#9837) (cherry picked from commit bf716e6) * [js/node] npm audit fix (#9861) (cherry picked from commit 27e337e) * [python manylinux package] emit warning if missing CUDA/TensorRT dependency causes ld_preload to fail and user tries to register either CUDA/TensorRT EP (#9872) * add warning if ld_preload fails for CUDA or TRT when trying to register either provider * refactor * change wording from register to create (cherry picked from commit ec9b0ed) * QDQ tool modification part2 (#9720) * Add finetuned qdq options * Add description * Add unit tests * Modify for channel axis * Remove too specific feature. Move this implementation to e2e example * Add OpTypesSupportPerChannelQuantization * fix bug for unit test * Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use Will have a follow-up PR to fine tune the code * remove unnecessary warning Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> (cherry picked from commit 0baf687) * Cancel transpose optimizer for resize (#9870) * cancel transpose optimizer for resize * add UT * addressing comments * fix build err (cherry picked from commit 16bfd3c) * Add build option to enable cuda profiling (#9875) (cherry picked from commit 9345894) Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Maajid khan <n.maajidkhan@gmail.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>

Fix memset size

0e97f26

yuslepukhin requested review from weixingzhang and snnn November 23, 2021 02:28

yuslepukhin added the release:1.10 label Nov 23, 2021

snnn reviewed Nov 23, 2021

View reviewed changes

snnn approved these changes Nov 23, 2021

View reviewed changes

yuslepukhin merged commit d012d9f into master Nov 23, 2021

yuslepukhin deleted the yuslepukhin/inverse_memset branch November 23, 2021 17:19

faxu added the triage:approved label Nov 29, 2021

jingyanwangms pushed a commit that referenced this pull request Nov 30, 2021

Fix memset size (#9840)

c78720a

(cherry picked from commit d012d9f)

jingyanwangms mentioned this pull request Nov 30, 2021

Release 1.10.0 cherry pick round 1 #9886

Merged

jingyanwangms removed the release:1.10 label Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memset size #9840

Fix memset size #9840

yuslepukhin commented Nov 23, 2021

snnn Nov 23, 2021

hariharans29 Nov 23, 2021 •

edited

Loading

yuslepukhin Nov 23, 2021

Fix memset size #9840

Fix memset size #9840

Conversation

yuslepukhin commented Nov 23, 2021

snnn Nov 23, 2021

Choose a reason for hiding this comment

hariharans29 Nov 23, 2021 • edited Loading

Choose a reason for hiding this comment

yuslepukhin Nov 23, 2021

Choose a reason for hiding this comment

hariharans29 Nov 23, 2021 •

edited

Loading