Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memset size #9840

Merged
merged 1 commit into from
Nov 23, 2021
Merged

Fix memset size #9840

merged 1 commit into from
Nov 23, 2021

Conversation

yuslepukhin
Copy link
Member

Description:
cudaMemset size should include size()

Motivation and Context
Make sure the buffer is fully zeroed.

@@ -150,7 +150,7 @@ Status Inverse::ComputeInternal(OpKernelContext* ctx) const {
}

IAllocatorUniquePtr<int> info = GetScratchBuffer<int>(num_batches);
CUDA_RETURN_IF_ERROR(cudaMemsetAsync(info.get(), 0, num_batches, Stream()));
CUDA_RETURN_IF_ERROR(cudaMemsetAsync(info.get(), 0, num_batches * sizeof(int), Stream()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, when will this async op be synced ?

Copy link
Member

@hariharans29 hariharans29 Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need explicit syncing right ? When launching operations using a single stream (per-session compute stream : Stream()), all CUDA operations in that stream are serialized and so they execute in the order in which they are queued. Anything queued after this on the same stream will execute only after this operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Hari is correct.

@yuslepukhin yuslepukhin merged commit d012d9f into master Nov 23, 2021
@yuslepukhin yuslepukhin deleted the yuslepukhin/inverse_memset branch November 23, 2021 17:19
jingyanwangms pushed a commit that referenced this pull request Nov 30, 2021
(cherry picked from commit d012d9f)
jingyanwangms added a commit that referenced this pull request Nov 30, 2021
* Fix memset size (#9840)

(cherry picked from commit d012d9f)

* [js/web] do not use nodejs type 'Buffer' in web (#9839)

* [js/web] do not use nodejs type 'Buffer' in web

* resolve comments and validate tests

* remove 'Buffer' in test

(cherry picked from commit a3ebc5e)

* Fix potential data race with OrtValue usage in Python (#9841)

(cherry picked from commit 18fd2cf)

* [OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848)

* Changes to ensure openvino build go through in Windows

* Modified Hetero plugin Logic

*Modified Hetero Feature logic. In Hetero,
if the operator to be marked true in getcapability(),
it should be supported by either of the devices
specified with HETERO in the device_type.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV updated to 2021.4.2 version

* OV updated to 2021.4.2 version

* Updated OV to 2021.4.2 version, mono download  link and dotnet version

* Copying Managed nugets in openvino c# docker file

*Copying Managed nuget to nugets artifacts
directory

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
(cherry picked from commit 0ae0f29)

* no fallback when enforcing explicit EP registration. (#9863)

* no fallback when enforcing explicit EP registration.

* add explicit ep registrations for python.

(cherry picked from commit 1e9e57d)

* layernorm throw error if input has no data (#9837)

(cherry picked from commit bf716e6)

* [js/node] npm audit fix (#9861)

(cherry picked from commit 27e337e)

* [python manylinux package] emit warning if missing CUDA/TensorRT dependency causes ld_preload to fail and user tries to register either CUDA/TensorRT EP (#9872)

* add warning if ld_preload fails for CUDA or TRT when trying to register either provider

* refactor

* change wording from register to create

(cherry picked from commit ec9b0ed)

* QDQ tool modification part2 (#9720)

* Add finetuned qdq options

* Add description

* Add unit tests

* Modify for channel axis

* Remove too specific feature. Move this implementation to e2e example

* Add OpTypesSupportPerChannelQuantization

* fix bug for unit test

* Keep flags OpTypesSupportPerChannelQuantization and QDQChannelAxis for internal use

Will have a follow-up PR to fine tune the code

* remove unnecessary warning

Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
(cherry picked from commit 0baf687)

* Cancel transpose optimizer for resize (#9870)

* cancel transpose optimizer for resize

* add UT

* addressing comments

* fix build err

(cherry picked from commit 16bfd3c)

* Add build option to enable cuda profiling (#9875)

(cherry picked from commit 9345894)

Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Maajid khan <n.maajidkhan@gmail.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants