Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synchronize in the beginning of all CUDA functions #2661

Merged
merged 1 commit into from
Jul 9, 2023

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Jul 8, 2023

Fix #2660.

TensorFlow made streams non-blocking in tensorflow/tensorflow@9d12620. Our own CUDA functions uses the default streams that is different from TensorFlow's, so we need to synchronize in the beginning of all functions.

In the future, it might be worth using the same stream as TensorFlow's to improve the performance.

Fix deepmodeling#2660.

TensorFlow made streams non-blocking in tensorflow/tensorflow@9d12620. Our own CUDA functions uses the default streams that is different from TensorFlow's, so we need to synchronize in the beginning of all functions.

In the future, it might be worth using the same stream as TensorFlow's to improve the performance.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz requested review from denghuilu and wanghan-iapcm July 8, 2023 01:24
@njzjz njzjz linked an issue Jul 8, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jul 8, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (37fd8d1) 78.35% compared to head (92fc01c) 78.35%.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #2661      +/-   ##
==========================================
- Coverage   78.35%   78.35%   -0.01%     
==========================================
  Files         235      235              
  Lines       24473    24473              
  Branches     1469     1469              
==========================================
- Hits        19177    19176       -1     
- Misses       4906     4907       +1     
  Partials      390      390              

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@wanghan-iapcm wanghan-iapcm merged commit d3d3c18 into deepmodeling:devel Jul 9, 2023
@njzjz njzjz mentioned this pull request Sep 19, 2023
wanghan-iapcm pushed a commit that referenced this pull request Sep 22, 2023
Merge `source/lib/src/cuda` and `source/lib/src/rocm` into
`source/lib/src/gpu`.

- Define macros `gpuGetLastError`, `gpuDeviceSynchronize`, `gpuMemcpy`,
`gpuMemcpyDeviceToHost`, `gpuMemcpyHostToDevice`, and `gpuMemset` to
make them available for both CUDA and ROCm.
- Use `<<< >>> syntax` for both CUDA and ROCm. Per
ROCm/HIP@cf78d85,
it has been supported in HIP since 2018.
- Fix several int const numbers that should be double or float.
- For tabulate:
- Fix `WARP_SIZE` for ROCm. Per
pytorch/pytorch#64302, WARP_SIZE can be 32 or
64, so it should not be hardcoded to 64.
- Add `GpuShuffleSync`. Per
ROCm/HIP#1491, `__shfl_sync`
is not supported by HIP.
  - After merging the code, #1274 should also work for ROCm.
- Use the same `ii` for #830 and #2357. Although both of them work, `ii`
has different meanings in these two PRs, but now it should be the same.
- However, `ii` in `tabulate_fusion_se_a_fifth_order_polynomial` (rocm)
added by #2532 is wrong. After merging the codes, it should be
corrected.
  - Optimization in #830 was not applied to ROCm.
  - `__syncwarp` is not supported by ROCm.
- After merging the code, #2661 will be applied to ROCm. Although TF
ROCm stream is still blocking
(https://github.com/tensorflow/tensorflow/blob/9d1262082e761cd85d6726bcbdfdef331d6d72c6/tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc#L566),
we don't know whether it will change to non-blocking.
- There are several other differences between CUDA and ROCm.

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] [critical] TF v2.13.0 calculates wrong GPU results
2 participants