Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3451: [C++/Python] pyarrow and numba CUDA interop #2732

Closed
wants to merge 35 commits into from

Conversation

pearu
Copy link
Contributor

@pearu pearu commented Oct 9, 2018

This PR implements the following features:

  1. pyarrow.cuda.Context.handle property returns CUcontext value
  2. pyarrow.cuda.Context.from_numba method to create Context that shares numba context handler
  3. pyarrow.cuda.Context.to_numba method to create a numba context instance that shares Context handler
  4. pyarrow.cuda.foreign_buffer method to create CudaBuffer of device memory defined by address and size. Resolved ARROW-3451.
  5. pyarrow.cuda.CudaBuffer.to_numba method to create a numba MemoryPointer instance so that CudaBuffer memory is used within the arguments to numba jitted functions.
  6. pyarrow.cuda.CudaBuffer.from_numba method to create a CudaBuffer view of device data represented by numba MemoryPointer.
  7. CudaDeviceManager::CreateSharedContext used in 2. Resolves ARROW-1423.
  8. CudaContext::View used in 4.

kou and others added 10 commits October 4, 2018 15:00
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#2701 from kou/add-workaround-missing-gemfile and squashes the following commits:

fc80d6e <Kouhei Sutou>  Add workaround to verify 0.11.0
It's helpful to setup test environment.

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#2702 from kou/c-glib-add-missing-gemfile-to-archive and squashes the following commits:

5c97dad <Kouhei Sutou>  Include Gemfile to archive
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#2703 from kou/expand-variables-in-commit-message and squashes the following commits:

da4f3bc <Kouhei Sutou>  Expand variables in commit message
Author: Pindikura Ravindra <ravindra@dremio.com>

Closes apache#2695 from pravindra/re2 and squashes the following commits:

4408b85 <Pindikura Ravindra> ARROW-3331:  Add re2 to toolchain
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#2706 from kou/fix-too-much-escape-changelog and squashes the following commits:

a635b86 <Kouhei Sutou>  Fix too much Markdown escape in CHANGELOG
@wesm
Copy link
Member

wesm commented Oct 9, 2018

I'm going to have to squash this branch to clean things up. Please avoid using git merge -- we do all of our branch and patch maintenance using rebase and linear commit history

Change-Id: I6057fdc68720c5d1c215ed16eb6aaedf0b67818e
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, let me know when you want me to review again

@wesm
Copy link
Member

wesm commented Oct 10, 2018

Sorry, I had force-pushed your branch.

@pearu pearu changed the title ARROW-3451 [WIP] [C++/Python] pyarrow and numba CUDA interop ARROW-3451 [C++/Python] pyarrow and numba CUDA interop Oct 10, 2018
@pearu
Copy link
Contributor Author

pearu commented Oct 10, 2018

@wesm , please review.

My git skills are not advanced, so, excuse my clumsiness with creating PRs :)

@wesm
Copy link
Member

wesm commented Oct 10, 2018

@pearu thanks -- I'm quite backlogged this week since at a conference but I will endeavor to review in the next 48h.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great. Here are a few minor comments.

python/pyarrow/_cuda.pyx Outdated Show resolved Hide resolved
python/pyarrow/_cuda.pyx Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/_cuda.pyx Outdated Show resolved Hide resolved
python/pyarrow/_cuda.pyx Outdated Show resolved Hide resolved
@pearu
Copy link
Contributor Author

pearu commented Oct 14, 2018

@wesm @pitrou , please review.

@codecov-io
Copy link

codecov-io commented Oct 14, 2018

Codecov Report

Merging #2732 into master will increase coverage by 0.86%.
The diff coverage is 5.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2732      +/-   ##
==========================================
+ Coverage   87.57%   88.44%   +0.86%     
==========================================
  Files         402      343      -59     
  Lines       61454    58554    -2900     
==========================================
- Hits        53821    51787    -2034     
+ Misses       7561     6767     -794     
+ Partials       72        0      -72
Impacted Files Coverage Δ
python/pyarrow/tests/test_cuda.py 1.55% <0%> (-0.01%) ⬇️
python/pyarrow/tests/test_cuda_numba_interop.py 5.31% <5.31%> (ø)
cpp/src/arrow/util/compression_snappy.cc 66.66% <0%> (-24.25%) ⬇️
cpp/src/arrow/util/compression_lz4.cc 73.91% <0%> (-9.43%) ⬇️
cpp/src/arrow/csv/column-builder.cc 94.92% <0%> (-2.18%) ⬇️
cpp/src/arrow/util/compression_brotli.cc 84.21% <0%> (-1.51%) ⬇️
cpp/src/arrow/util/compression-test.cc 99.07% <0%> (-0.93%) ⬇️
cpp/src/arrow/util/compression_zstd.cc 82.97% <0%> (-0.36%) ⬇️
python/pyarrow/tests/test_io.py 99.15% <0%> (-0.01%) ⬇️
cpp/src/parquet/schema-test.cc 99.48% <0%> (-0.01%) ⬇️
... and 98 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4f6269...3e99126. Read the comment docs.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! A few more comments still.

python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_cuda_numba_interop.py Outdated Show resolved Hide resolved
@pitrou
Copy link
Member

pitrou commented Oct 15, 2018

By the way, I'm getting the following warning when building:

/home/antoine/arrow/python/build/temp.linux-x86_64-3.7/_cuda.cpp: In function ‘int __pyx_pf_7pyarrow_5_cuda_7Context___cinit__(__pyx_obj_7pyarrow_5_cuda_Context*, int, int)’:
/home/antoine/arrow/python/build/temp.linux-x86_64-3.7/_cuda.cpp:4040:121: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     __pyx_t_1 = __pyx_f_7pyarrow_3lib_check_status(__pyx_v_manager->CreateSharedContext(__pyx_v_device_number, ((void *)__pyx_v_handle), (&__pyx_v_self->context))); if (unlikely(__pyx_t_1 == ((int)-1))) __PYX_ERR(0, 51, __pyx_L1_error)
                                                                                                                         ^~~~~~~~~~~~~~

@pitrou
Copy link
Member

pitrou commented Oct 15, 2018

Also, all the tests fail here:

Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_cuda_numba_interop.py", line 44, in setup_module
    module.ctx = cuda.Context.from_numba(nb_ctx)
  File "pyarrow/_cuda.pyx", line 74, in pyarrow._cuda.Context.from_numba
    return Context(device_number=context.device.id,
  File "pyarrow/_cuda.pyx", line 29, in pyarrow._cuda.Context.__cinit__
    def __cinit__(self, int device_number=0, int handle=0):
OverflowError: value too large to convert to int

The following patch fixes the issue:

diff --git a/python/pyarrow/_cuda.pyx b/python/pyarrow/_cuda.pyx
index 1887d54a..84845889 100644
--- a/python/pyarrow/_cuda.pyx
+++ b/python/pyarrow/_cuda.pyx
@@ -26,7 +26,7 @@ cdef class Context:
     """ CUDA driver context.
     """
 
-    def __cinit__(self, int device_number=0, int handle=0):
+    def __cinit__(self, int device_number=0, intptr_t handle=0):
         """Construct the shared CUDA driver context for a particular device.
 
         Parameters

@pitrou
Copy link
Member

pitrou commented Oct 18, 2018

I'd favour approach 2. The following patch gets the Numba interop to work. But it also fails the other tests, since we need to expose pushing / popping in some way (and/or some kind of guard like ContextSaver). Some global APIs like cuMemAllocHost seem to require a current context to be available.

Note the CUDA docs explicitly say about the primary context API:

The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.

diff --git a/cpp/src/arrow/gpu/cuda_context.cc b/cpp/src/arrow/gpu/cuda_context.cc
index d07b8b00..6952bf73 100644
--- a/cpp/src/arrow/gpu/cuda_context.cc
+++ b/cpp/src/arrow/gpu/cuda_context.cc
@@ -40,16 +40,13 @@ struct CudaDevice {
 
 class ContextSaver {
  public:
-  explicit ContextSaver(CUcontext new_context) : context_(NULL) {
-    cuCtxGetCurrent(&context_);
-    cuCtxSetCurrent(new_context);
+  explicit ContextSaver(CUcontext new_context) {
+    cuCtxPushCurrent(new_context);
   }
   ~ContextSaver() {
-    if (context_ != NULL) cuCtxSetCurrent(context_);
+    CUcontext unused;
+    cuCtxPopCurrent(&unused);
   }
-
- private:
-  CUcontext context_;
 };
 
 class CudaContext::CudaContextImpl {
@@ -59,7 +56,7 @@ class CudaContext::CudaContextImpl {
   Status Init(const CudaDevice& device) {
     device_ = device;
     own_context_ = true;
-    CU_RETURN_NOT_OK(cuCtxCreate(&context_, 0, device_.handle));
+    CU_RETURN_NOT_OK(cuDevicePrimaryCtxRetain(&context_, device_.handle));
     is_open_ = true;
     return Status::OK();
   }
@@ -74,7 +71,7 @@ class CudaContext::CudaContextImpl {
 
   Status Close() {
     if (is_open_ && own_context_) {
-      CU_RETURN_NOT_OK(cuCtxDestroy(context_));
+      CU_RETURN_NOT_OK(cuDevicePrimaryCtxRelease(device_.handle));
     }
     is_open_ = false;
     return Status::OK();

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

Defining

  ContextSaver() {
    CUcontext current;
    cuCtxGetCurrent(&current);
    cuCtxPushCurrent(current);
  }

and using ContextSaver set_temporary(); for setting the current context in AllocateHost, I get

pyarrow.lib.ArrowIOError: Cuda Driver API call in /home/pearu/git/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 190 failed with code 201: cuMemHostAlloc(reinterpret_cast<void**>(out), static_cast<size_t>(nbytes), CU_MEMHOSTALLOC_PORTABLE)

OTH, memory created by AllocateHost should be available for all contexts. So, I think something else is needed here..

@pitrou
Copy link
Member

pitrou commented Oct 18, 2018

cuCtxGetCurrent can give you a nullptr if no current context exists. Actually, if you already have a current context, it doesn't make sense to push it again.

What is needed is to initialize a context for the device, with cuDevicePrimaryCtxRetain.

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

Yeah, I am now using assert(current != NULL); to catch this case.

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

Yes, the current was actually NULL.
Should have used ContextSaver set_temporary; instead of ContextSaver set_temporary();.

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

I got more tests to work using cuDevicePrimaryCtxRetain in place of cuCtxGetCurrent.
Only IPC tests fail now..

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

OK. I got all tests passing for push-pop context management pattern. Will clean up and commit.

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

@pitrou , now we are using push-pop pattern for setting the context. I am curious, will the tests pass in your machine when using cbuf.context.synchronize?

Now that pyarrow and numba use primary context that is unique to the device, it looks like there is no need to store context handles anymore - one can always retrieve it with cuDevicePrimaryCtxRetain.
Does anyone know if this function is expensive enough that it is worth of cache its results?

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

Currently, AllocateHost uses the primary context of device=0.
I plan to add a device argument to AllocateHost.

@pitrou
Copy link
Member

pitrou commented Oct 18, 2018

There is one failure remaining in test_cuda.py. Do you test in debug mode?

__________________________________________________________________________ test_IPC __________________________________________________________________________
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_cuda.py", line 579, in test_IPC
    assert p.exitcode == 0
AssertionError: assert -6 == 0
 +  where -6 = <SpawnProcess(SpawnProcess-1, stopped[SIGABRT])>.exitcode
-------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libarrow.so.12(_ZN5arrow4util7CerrLog14PrintBackTraceEv+0x35)[0x7f06f6cf65c9]
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libarrow.so.12(_ZN5arrow4util7CerrLogD1Ev+0x5b)[0x7f06f6cf654b]
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libarrow.so.12(_ZN5arrow4util7CerrLogD0Ev+0x18)[0x7f06f6cf656c]
/home/antoine/miniconda3/envs/pyarrow/bin/../lib/libarrow.so.12(_ZN5arrow4util8ArrowLogD1Ev+0x57)[0x7f06f6cf6399]
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow_gpu.so.12(_ZN5arrow3gpu10CudaBufferD1Ev+0xbb)[0x7f06f5bf8d05]
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow_gpu.so.12(_ZN9__gnu_cxx13new_allocatorIN5arrow3gpu10CudaBufferEE7destroyIS3_EEvPT_+0x23)[0x7f06f5bf85bb]
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow_gpu.so.12(_ZNSt16allocator_traitsISaIN5arrow3gpu10CudaBufferEEE7destroyIS2_EEvRS3_PT_+0x23)[0x7f06f5bf8547]
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow_gpu.so.12(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow3gpu10CudaBufferESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x37)[0x7f06f5bf81ed]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x42)[0x7f06f760d71e]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EED2Ev+0x27)[0x7f06f7608f35]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(_ZNSt12__shared_ptrIN5arrow6BufferELN9__gnu_cxx12_Lock_policyE2EED2Ev+0x1c)[0x7f06f7605d48]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(_ZNSt10shared_ptrIN5arrow6BufferEED2Ev+0x18)[0x7f06f7605d64]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(_Z21__Pyx_call_destructorISt10shared_ptrIN5arrow6BufferEEEvRT_+0x18)[0x7f06f760d0da]
/home/antoine/arrow/python/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so(+0x2cae6a)[0x7f06f75a1e6a]
/home/antoine/arrow/python/pyarrow/_cuda.cpython-37m-x86_64-linux-gnu.so(+0x40a3a)[0x7f06f1fd6a3a]
/home/antoine/miniconda3/envs/pyarrow/bin/python(+0xfadc8)[0x55fb8bb60dc8]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyFunction_FastCallDict+0x142)[0x55fb8bb79522]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalFrameDefault+0x1ded)[0x55fb8bc3be7d]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyFunction_FastCallKeywords+0xfb)[0x55fb8bbc941b]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalFrameDefault+0x520)[0x55fb8bc3a5b0]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyFunction_FastCallKeywords+0xfb)[0x55fb8bbc941b]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalFrameDefault+0x520)[0x55fb8bc3a5b0]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyFunction_FastCallKeywords+0xfb)[0x55fb8bbc941b]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalFrameDefault+0x6d6)[0x55fb8bc3a766]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalCodeWithName+0x2e8)[0x55fb8bb78528]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyFunction_FastCallKeywords+0x387)[0x55fb8bbc96a7]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalFrameDefault+0x149a)[0x55fb8bc3b52a]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_PyEval_EvalCodeWithName+0x2e8)[0x55fb8bb78528]
/home/antoine/miniconda3/envs/pyarrow/bin/python(PyEval_EvalCodeEx+0x44)[0x55fb8bb793a4]
/home/antoine/miniconda3/envs/pyarrow/bin/python(PyEval_EvalCode+0x1c)[0x55fb8bb793cc]
/home/antoine/miniconda3/envs/pyarrow/bin/python(+0x22d304)[0x55fb8bc93304]
/home/antoine/miniconda3/envs/pyarrow/bin/python(PyRun_StringFlags+0x7d)[0x55fb8bc9c3ed]
/home/antoine/miniconda3/envs/pyarrow/bin/python(PyRun_SimpleStringFlags+0x3f)[0x55fb8bc9c44f]
/home/antoine/miniconda3/envs/pyarrow/bin/python(+0x236cd8)[0x55fb8bc9ccd8]
/home/antoine/miniconda3/envs/pyarrow/bin/python(_Py_UnixMain+0x80)[0x55fb8bc9d3f0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f070fd6eb97]
/home/antoine/miniconda3/envs/pyarrow/bin/python(+0x1e3e32)[0x55fb8bc49e32]
-------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------
../src/arrow/gpu/cuda_memory.cc:83:  Check failed: Close().ok() 

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

No, I am not testing in debug mode.

@pitrou
Copy link
Member

pitrou commented Oct 18, 2018

You should :-)

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

Re IPC test failure: I added a synchronize call that might fix it (although exit code -6 is worrying).

I'll use debug mode.

In debug mode, I can now reproduce the IPC test failure..

The likely solution is to provide context when calling cuIpcCloseMemHandle.

@pearu
Copy link
Contributor Author

pearu commented Oct 18, 2018

@pitrou , I believe the last commit fixes the IPC test, please review.

@pitrou
Copy link
Member

pitrou commented Oct 23, 2018

@pearu Yes, the tests pass now! I'll start a review.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except some very minor comments. Thanks for being persistent!

cpp/src/arrow/gpu/cuda_context.cc Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
cpp/src/arrow/gpu/cuda_context.cc Outdated Show resolved Hide resolved
@pearu
Copy link
Contributor Author

pearu commented Oct 23, 2018

@pitrou @wesm please review.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit. Thank you!

@@ -36,15 +36,24 @@ class ARROW_EXPORT CudaDeviceManager {
public:
static Status GetInstance(CudaDeviceManager** manager);

/// \brief Get the shared CUDA driver context for a particular device
/// \brief Get the cached CUDA driver context for a particular device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need removing: "cached"

@pitrou
Copy link
Member

pitrou commented Oct 24, 2018

PR failed because of unrelated flake8 issues :-(

@pitrou
Copy link
Member

pitrou commented Oct 24, 2018

I'm gonna merge anyway. Don't want to mess up this PR by trying to rebase and push to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants