-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that RAPIDS_NO_INITIALIZE means no cuInit #12361
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
/* | ||
* Copyright (c) 2022, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#ifndef _GNU_SOURCE | ||
#define _GNU_SOURCE | ||
#endif | ||
#include <cuda.h> | ||
#include <dlfcn.h> | ||
#include <iostream> | ||
|
||
#if defined(__GLIBC__) && __GLIBC__ >= 2 && defined(__GLIBC_MINOR__) && __GLIBC_MINOR__ >= 1 | ||
namespace { | ||
static int cuInitCount{0}; | ||
using init_t = CUresult (*)(unsigned int); | ||
using proc_t = CUresult (*)(const char*, | ||
void**, | ||
int, | ||
cuuint64_t | ||
#if CUDA_VERSION >= 12000 | ||
, | ||
CUdriverProcAddressQueryResult* | ||
#endif | ||
); | ||
using dlsym_t = void* (*)(void*, const char*); | ||
static init_t original_cuInit{nullptr}; | ||
static proc_t original_cuGetProcAddress{nullptr}; | ||
static dlsym_t original_dlsym{nullptr}; | ||
|
||
static __attribute__((constructor)) void init_cuInit_hack() | ||
{ | ||
// Hack hack hack, only for glibc, this magic number can be found in | ||
// glibc's sysdeps/unix/sysv/linux/x86_64/64/libc.abilist (glibc >= | ||
// 2.34) (or libdl.abilist (glibc < 2.34). | ||
original_dlsym = (dlsym_t)dlvsym(RTLD_NEXT, "dlsym", "GLIBC_2.2.5"); | ||
if (original_dlsym) { | ||
original_cuGetProcAddress = (proc_t)original_dlsym(RTLD_NEXT, "cuGetProcAddress"); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For driver calls there are two ways python libraries resolve them:
So unfortunately, it's not sufficient to just define I guess I could spin over a bunch of versions until I find the right one. Any other suggestions gratefully received. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Versioning is not quite as bad as this, 2.2.5 is a magic number but will be stable forever (due to glibc's forward-compat guarantee). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you patch into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I can patch into numba, because the cuda interface is implemented in python, but can't do that for either cuda-python (or cupy) because their cuda interface is implemented in cython (so compiled) and hence monkey-patching won't work. I also want to avoid a situation where some further third-party dependency is pulled in that also brings up a cuda context (perhaps directly via the C API). Since eventually everyone actually calls into the driver API, this seems like the best place to hook in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some Linux options that I think are more reliable and do not require patching. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will that tell me if cuInit is called? I think no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we can run the process and inspect with nvml and try and match that way There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This SO post seems to settle on basically the same thing that you do (funnily enough, there's another post about how Citrix copy-pasted this solution disregarding the issues and broke some users). Due to the extensive dlopening/dlsyming happening, I am not sure that either strace or ltrace or anything like them will be sufficient to detect the calls, which would have been the easier route here as David suggests. If all functions were called by name then I think ltrace would have been sufficient, but as it is you'll only see the dlopen of libcuda.so and then the dlsym of some arbitrary memory address. You could hope that the dlsym calls always use a name for the handle that includes cuInit; I think that would show up? It would probably only catch a subset of cases though. |
||
} | ||
|
||
extern "C" { | ||
CUresult cuInit(unsigned int flags) | ||
{ | ||
if (!original_cuInit) { | ||
void* ptr{nullptr}; | ||
CUresult err = original_cuGetProcAddress("cuInit", | ||
&ptr, | ||
CUDA_VERSION, | ||
CU_GET_PROC_ADDRESS_DEFAULT | ||
#if CUDA_VERSION >= 12000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ABI change. |
||
, | ||
nullptr | ||
#endif | ||
); | ||
if (err != CUDA_SUCCESS) { return err; } | ||
if (ptr) { original_cuInit = (init_t)(ptr); } | ||
} | ||
std::cerr << "cuInit has been called " << ++cuInitCount << " times" << std::endl; | ||
if (original_cuInit) { | ||
return original_cuInit(flags); | ||
} else { | ||
return CUDA_ERROR_NOT_INITIALIZED; | ||
} | ||
} | ||
|
||
CUresult cuGetProcAddress(const char* symbol, | ||
void** pfn, | ||
int cudaVersion, | ||
cuuint64_t flags | ||
#if CUDA_VERSION >= 12000 | ||
, | ||
CUdriverProcAddressQueryResult* symbolStatus | ||
#endif | ||
) | ||
{ | ||
if (!original_cuGetProcAddress) { return CUDA_ERROR_NOT_SUPPORTED; } | ||
CUresult err = original_cuGetProcAddress(symbol, | ||
pfn, | ||
cudaVersion, | ||
flags | ||
#if CUDA_VERSION >= 12000 | ||
, | ||
symbolStatus | ||
#endif | ||
); | ||
if (std::string{symbol} == "cuInit") { | ||
original_cuInit = (init_t)(*pfn); | ||
*pfn = (void*)cuInit; | ||
} | ||
return err; | ||
} | ||
|
||
void* dlsym(void* handle, const char* name_) | ||
{ | ||
std::string name{name_}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: error handling in all these wrappers in case the resolution of the original functions failed (at which point we can only abort) |
||
if (name == "cuInit") { | ||
return (void*)cuInit; | ||
} else if (name == "cuGetProcAddress") { | ||
return (void*)cuGetProcAddress; | ||
} else { | ||
return original_dlsym(handle, name_); | ||
} | ||
} | ||
} | ||
} // namespace | ||
#endif | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This same stuff could easily be extended to address @jrhemstad's request in #11546 that one test that RMM is the only allocator of memory. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION. | ||
|
||
import os | ||
import subprocess | ||
import sys | ||
from pathlib import Path | ||
|
||
import pytest | ||
|
||
location = Path(__file__) | ||
cpp_build_dir = location / ".." / ".." / ".." / ".." / ".." / "cpp" / "build" | ||
libintercept = (cpp_build_dir / "libcudfcuinit_intercept.so").resolve() | ||
Comment on lines
+10
to
+12
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the right way to reference this? Right now I'm assuming the build directory exists (because I didn't manage to wrangle cmake to install the library). Equally, however, I'm not sure we really want to install this library? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh boy, this is fun. I don't think there is a perfect solution here. FWIW my approach to this in #11875 was to move building the preload lib out of the main libcudf build, build it separately as part of CI, and then just launch tests with the preload library directly from the CLI in CI. That functionality was disabled as part of the Jenkins->GHA migration. Given that you're working on this, it may be time to investigate how to reenable that functionality within GHA. @robertmaynard do you think that preload libraries like this or the stream verification lib should be built within the main CMakeLists.txt for the library, or shipped along with the conda packages? I had avoided that mostly because in the end we need the paths to the library anyway in order to preload, so it's not a great fit, but I know others had expressed different opinions. Depending on what direction we take with that we will need to adapt the solution in this pytest for how the library is discovered I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presume that the stream verification lib is also a single library. My first thought had been to just compile to the |
||
|
||
|
||
@pytest.mark.skipif( | ||
not libintercept.exists(), | ||
reason="libcudfcuinit_intercept.so not built, can't check for cuInit", | ||
) | ||
def test_import_no_cuinit(): | ||
env = os.environ.copy() | ||
env["RAPIDS_NO_INITIALIZE"] = "1" | ||
env["LD_PRELOAD"] = str(libintercept) | ||
output = subprocess.check_output( | ||
[sys.executable, "-c", "import cudf"], | ||
env=env, | ||
stderr=subprocess.STDOUT, | ||
) | ||
assert "cuInit has been called" not in output.decode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some help here, I'm completely flying blind, and this is wrong AFAICT.
Basically, I have a single file that I want to compile into a shared library and link against
libdl
andlibcuda
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to link to libdl? I think linking to libc is sufficient for your purposes (dlfcn). The linking to CUDA seems reasonable here (although if you care specifically about whether it's dynamically or statically linked you will want to set the CUDA_RUNTIME_LIBRARY property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is only glibc 2.34 and later where you don't need to link libdl to get access to dlsym and friends (see https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html and bminor/glibc@77f876c) unless I am misunderstanding something.
In any case, would be very happy for someone who knows what they are doing to help rewrite this part of the patch completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't need cudart at all, only
-lcuda
, which I think I should get withCUDA::cuda_driver
?