Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paddle backend #318

Merged

Conversation

HydrogenSulfate
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate commented Sep 28, 2024

Category

  • New feature
  • Bugfix
  • Breaking change
  • Refactoring
  • Documentation
  • Other (please explain)

Description

As talked in #285, we add PaddlePaddle framework as one of backends of warp.

Thanks for reply from @shi-eric(#313 (comment)), I create this PR as contained only one commit.

This PR additionally adds support for the bool type to warp’s dlpack, based on this discussion: dmlc/dlpack#75 and the corresponding PR: dmlc/dlpack#76. The boolean type representation seems uses 8 bits instead of the current 1 bit finally. I am not sure if there were any additional considerations regarding this in warp’s development. If there are specific explanations, please let me know. I will adjust the support for the boolean type of paddle, according to the suggestions, thanks :).

run test:

# 1. install paddlepaddle
# cu123
$ python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
# cu118
# python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/

# 2. install warp
$ cd warp/
$ python build.py
$ pip install -e .

# 3. run test related to paddle framework
$ CUDA_VISIBLE_DEVICES=0 python warp/tests/test_dlpack.py -k paddle
Warp 1.3.3 initialized:
   CUDA Toolkit 11.6, Driver 12.0
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "Tesla V100-PCIE-16GB" (16 GiB, sm_70, mempool enabled)
   Kernel cache:
     /root/.cache/warp/1.3.3
Skipping Jax DLPack tests due to exception: No module named 'jax'
W0928 17:58:46.172823 11252 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.6
W0928 17:58:46.174096 11252 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
Skipping Paddle DLPack tests due to exception: name 'test_dlpack_paddle_to_warp_v2' is not defined
test_dlpack_paddle_to_warp_cuda_0 (__main__.TestDLPack) ... Module __main__ cac4f71 load on device 'cuda:0' took 373.07 ms  (compiled)
ok
test_dlpack_warp_to_paddle_cuda_0 (__main__.TestDLPack) ... ok
test_dlpack_warp_to_paddle_v2_cuda_0 (__main__.TestDLPack) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.498s

OK

$ CUDA_VISIBLE_DEVICES=0 python warp/tests/test_dlpack.py -k paddle
Warp 1.3.3 initialized:
   CUDA Toolkit 11.6, Driver 12.0
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "Tesla V100-PCIE-16GB" (16 GiB, sm_70, mempool enabled)
   Kernel cache:
     /root/.cache/warp/1.3.3
Skipping Jax DLPack tests due to exception: No module named 'jax'
W0928 13:15:33.230201 29239 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.6
W0928 13:15:33.231549 29239 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
test_dlpack_paddle_to_warp_cuda_0 (__main__.TestDLPack) ... Module __main__ cac4f71 load on device 'cuda:0' took 387.06 ms  (compiled)
ok
test_dlpack_paddle_to_warp_v2_cuda_0 (__main__.TestDLPack) ... ok
test_dlpack_warp_to_paddle_cuda_0 (__main__.TestDLPack) ... ok
test_dlpack_warp_to_paddle_v2_cuda_0 (__main__.TestDLPack) ... ok

----------------------------------------------------------------------
Ran 4 tests in 0.567s

OK

todolist:

Changelog

  • Add specific line-by-line info of high level changes in this PR.

Before your PR is "Ready for review"

  • Do you agree to the terms under which contributions are accepted as described in Section 9 the Warp License?
  • Have you read the Contributor Guidelines?
  • Have you written any new necessary tests?
  • Have you added or updated any necessary documentation?
  • Have you added any files modified by compiling Warp and building the documentation to this PR (.e.g. stubs.py, functions.rst)?
  • Does your code pass ruff check and ruff format --check?

@HydrogenSulfate HydrogenSulfate mentioned this pull request Sep 28, 2024
14 tasks
@HydrogenSulfate HydrogenSulfate force-pushed the HydrogenSulfate/add_paddle_backend_v2 branch 3 times, most recently from 3ce4189 to 58769e8 Compare September 28, 2024 10:21
@shi-eric
Copy link
Contributor

Did you forget to include paddle.py?

docs/installation.rst Outdated Show resolved Hide resolved
@HydrogenSulfate HydrogenSulfate force-pushed the HydrogenSulfate/add_paddle_backend_v2 branch from 58769e8 to 142538d Compare September 29, 2024 02:15
@HydrogenSulfate HydrogenSulfate force-pushed the HydrogenSulfate/add_paddle_backend_v2 branch from 142538d to 7f5acd7 Compare September 29, 2024 07:23
@mmacklin
Copy link
Collaborator

Thank you for the very nice PR! This is really well done, we are aiming to merge this with our 1.4.0 release this week.

shi-eric added a commit that referenced this pull request Sep 30, 2024
Add paddle backend to warp

Closes GH-318

See merge request omniverse/warp!762
@shi-eric shi-eric merged commit 7f5acd7 into NVIDIA:main Sep 30, 2024
11 of 12 checks passed
@HydrogenSulfate
Copy link
Contributor Author

Thank you for the very nice PR! This is really well done, we are aiming to merge this with our 1.4.0 release this week.

Thanks a lot to @mmacklin and @shi-eric for your review. Note that there seems still a link error in build-and-test / pull-request-docs CI, which might because
the reason @shi-eric mentioned(#318 (comment)), I am afraid of warp's document CI will be affected by this.
image
Please check if https://www.paddlepaddle.org.cn/ be accessed in US, or please help to replace it with https://github.com/PaddlePaddle/Paddle

@shi-eric
Copy link
Contributor

Thank you for the very nice PR! This is really well done, we are aiming to merge this with our 1.4.0 release this week.

Thanks a lot to @mmacklin and @shi-eric for your review. Note that there seems still a link error in build-and-test / pull-request-docs CI, which might because the reason @shi-eric mentioned(#318 (comment)), I am afraid of warp's document CI will be affected by this. image Please check if https://www.paddlepaddle.org.cn/ be accessed in US, or please help to replace it with https://github.com/PaddlePaddle/Paddle

Yeah, I fixed that with ab147a0 so we're all good. I did notice that when I tried to install Paddle (CUDA 12.3) and run test_paddle.py on my (multi-GPU) system, I got a bunch of CUDA-related errors. I'll let you know if I have a chance to look into it...

@HydrogenSulfate
Copy link
Contributor Author

HydrogenSulfate commented Sep 30, 2024

Thank you for the very nice PR! This is really well done, we are aiming to merge this with our 1.4.0 release this week.

Thanks a lot to @mmacklin and @shi-eric for your review. Note that there seems still a link error in build-and-test / pull-request-docs CI, which might because the reason @shi-eric mentioned(#318 (comment)), I am afraid of warp's document CI will be affected by this. image Please check if https://www.paddlepaddle.org.cn/ be accessed in US, or please help to replace it with https://github.com/PaddlePaddle/Paddle

Yeah, I fixed that with ab147a0 so we're all good. I did notice that when I tried to install Paddle (CUDA 12.3) and run test_paddle.py on my (multi-GPU) system, I got a bunch of CUDA-related errors. I'll let you know if I have a chance to look into it...

Thanks for fixing that, I found it may because not setting device_id correctly in FromDLpack of paddle
framework, so when a tensor from gpu other than the first device(id=0), paddle still setting the default id(0) for converted tensor from warp(it should be src->dl_tensor.device.device_id, https://github.com/PaddlePaddle/Paddle/blob/e549437ad8bda6ff55ed22039eb09bafa74f78ba/paddle/fluid/framework/tensor_util.cc#L910-L911). In that case, if CUDA_VISIBLE_DEVICES is not a single but a List of ids, device id consistency check will failed intest-paddle.py. I will verify and fix it soonly.

@HydrogenSulfate HydrogenSulfate deleted the HydrogenSulfate/add_paddle_backend_v2 branch October 10, 2024 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants