Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set #12545

wence- · 2023-01-13T10:56:38Z

Description

An alternate approach to that tried in #12361, here we just script GDB and check if we hit a breakpoint in cuInit. When RAPIDS_NO_INITIALIZE is set in the environment, merely importing cudf should not call into the CUDA runtime/driver (i.e. no cuInit should be called). Conversely, to check that we are scripting GDB properly, when we create a cudf object, we definitely should hit cuInit.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence- · 2023-01-13T11:02:10Z

This is an alternate approach to checking that we don't call cuInit that doesn't need to hook into dlsym (like #12361). I use GDB to find a breakpoint rather than running nvprof because nvprof is not supported on devices with compute capability 8 and greater. In those circumstances one is supposed to use nsys, but I can't make that work right, in that both with and without RAPIDS_NO_INITIALIZE in the environment, I always see cuInit in the profiling stack (even though under GDB I do not hit a breakpoint if RAPIDS_NO_INITIALIZE is set):

$ nsys profile --stats=true --trace=cuda python -c "import cudf;"
Generating '/tmp/nsys-report-fedd.qdstrm'
[1/6] [========================100%] report1.nsys-rep
[2/6] [========================100%] report1.sqlite
[3/6] Executing 'cudaapisum' stats report

 Time (%)  Total Time (ns)  Num Calls  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)           Name         
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------
     97.9           158717        768     206.7     169.0       114      2233        126.8  cuGetProcAddress      
      1.9             3151          2    1575.5    1575.5      1528      1623         67.2  cuInit                
      0.2              327          2     163.5     163.5       145       182         26.2  cuModuleGetLoadingMode

versus:

$ RAPIDS_NO_INITIALIZE=1 nsys profile --stats=true --trace=cuda python -c "import cudf;"
Generating '/tmp/nsys-report-0f63.qdstrm'
[1/6] [========================100%] report2.nsys-rep
[2/6] [========================100%] report2.sqlite
[3/6] Executing 'cudaapisum' stats report

 Time (%)  Total Time (ns)  Num Calls  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)           Name         
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------
     98.3            75237        384     195.9     157.0       107      1771        116.5  cuGetProcAddress      
      1.5             1138          1    1138.0    1138.0      1138      1138          0.0  cuInit                
      0.2              181          1     181.0     181.0       181       181          0.0  cuModuleGetLoadingMode

An alternate approach to that tried in rapidsai#12361, here we just script GDB and check if we hit a breakpoint in cuInit. When RAPIDS_NO_INITIALIZE is set in the environment, merely importing cudf should not call into the CUDA runtime/driver (i.e. no cuInit should be called). Conversely, to check that we are scripting GDB properly, when we create a cudf object, we definitely _should_ hit cuInit.

codecov · 2023-01-13T12:18:40Z

Codecov Report

Base: 86.58% // Head: 85.70% // Decreases project coverage by -0.88% ⚠️

Coverage data is based on head (34a6b3a) compared to base (b6dccb3).
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-23.02   #12545      +/-   ##
================================================
- Coverage         86.58%   85.70%   -0.88%     
================================================
  Files               155      155              
  Lines             24368    24865     +497     
================================================
+ Hits              21098    21311     +213     
- Misses             3270     3554     +284

Impacted Files	Coverage Δ
python/cudf/cudf/_version.py	`1.41% <0.00%> (-98.59%)`	⬇️
python/cudf/cudf/core/buffer/spill_manager.py	`72.50% <0.00%> (-7.50%)`	⬇️
python/cudf/cudf/core/buffer/spillable_buffer.py	`91.07% <0.00%> (-1.78%)`	⬇️
python/cudf/cudf/utils/dtypes.py	`77.85% <0.00%> (-1.61%)`	⬇️
python/cudf/cudf/options.py	`86.11% <0.00%> (-1.59%)`	⬇️
python/cudf/cudf/core/single_column_frame.py	`94.30% <0.00%> (-1.27%)`	⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py	`98.38% <0.00%> (-1.01%)`	⬇️
python/dask_cudf/dask_cudf/io/csv.py	`96.34% <0.00%> (-1.00%)`	⬇️
python/dask_cudf/dask_cudf/io/parquet.py	`91.81% <0.00%> (-0.59%)`	⬇️
python/cudf/cudf/core/multiindex.py	`91.66% <0.00%> (-0.51%)`	⬇️
... and 43 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

shwina

I'm +1 to this approach

davidwendt · 2023-01-13T14:38:18Z

I like this approach too.
I wonder if you could use cuda-gdb instead of gdb only because it may already be installed with the CUDA toolkit and therefore would not be required to be added to the yamls.

vyasr

This is much better!

Minor nit: would it be simpler to write out the gdb command into a bash script (could even write to a tempfile within the Python script itself if we really wanted to be fancy, but I'm mostly just thinking of saving a bash script in the repo to be executed by the test)? It could accept a single argument, the Python command to execute, and then return an appropriate exit code. The subprocess lines aren't terribly readable.

We'll just use cuda-gdb which already exists instead.

…re/test-no-cuInit-via-gdb

wence- · 2023-01-17T10:42:36Z

I like this approach too. I wonder if you could use cuda-gdb instead of gdb only because it may already be installed with the CUDA toolkit and therefore would not be required to be added to the yamls.

Thanks, done!

wence- · 2023-01-17T10:43:11Z

This is much better!

Minor nit: would it be simpler to write out the gdb command into a bash script (could even write to a tempfile within the Python script itself if we really wanted to be fancy, but I'm mostly just thinking of saving a bash script in the repo to be executed by the test)? It could accept a single argument, the Python command to execute, and then return an appropriate exit code. The subprocess lines aren't terribly readable.

gdb -x accepts commands from stdin, so I've just encoded them as input to the subprocess call instead of creating a temporary file.

…re/test-no-cuInit-via-gdb

wence- · 2023-01-17T18:32:56Z

Skipping test on arm64 for now, since cuda-gdb is non-functional.

ajschmidt8 · 2023-01-18T22:09:22Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

wence- · 2023-01-19T10:37:10Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

Originally I modified the dependencies.yaml but it turned out not to be necessary.

…re/test-no-cuInit-via-gdb

wence- · 2023-01-19T10:39:25Z

/merge

…/test-no-cuInit-via-gdb

Since rapidsai#12545, we use cuda-gdb for scripting (since that is already installed in the nvidia/cuda:-devel docker images) to check that RAPIDS_NO_INITIALIZE ensured that cuInit is not called on import. Unfortunately, the official cuda-gdb-11-2 package for Debian-based systems does not correctly advertise all its dependencies (we need to manually install libtinfo5 and libncursesw5). Consequently cuda-gdb does not work if the base image rapids builds against is for CUDA 11.2. To workaround this, just ask conda to install gdb for us, since we don't actually need any cuda-gdb features.

Since rapidsai#12545, we use cuda-gdb for scripting (since that is already installed in the nvidia/cuda:-devel docker images) to check that RAPIDS_NO_INITIALIZE ensured that cuInit is not called on import. Unfortunately, the official cuda-gdb-11-2 package for Debian-based systems does not correctly advertise all its dependencies (we need to manually install libtinfo5 and libncursesw5). Consequently cuda-gdb does not work if the base image rapids builds against is for CUDA 11.2. To workaround this, build the cuda-gdb command as a fixture that is appropriately marked in the cases where it is either not installed, or else not working due to broken dependencies.

) Since #12545, we use cuda-gdb for scripting (since that is already installed in the nvidia/cuda:-devel docker images) to check that RAPIDS_NO_INITIALIZE ensured that cuInit is not called on import. Unfortunately, the official cuda-gdb-11-2 package for Debian-based systems does not correctly advertise all its dependencies (we need to manually install libtinfo5 and libncursesw5). Consequently cuda-gdb does not work if the base image rapids builds against is for CUDA 11.2. To workaround this, build the cuda-gdb command as a fixture that is appropriately marked in the cases where it is either not installed, or else not working due to broken dependencies. Authors: - Lawrence Mitchell (https://github.com/wence-) - Ashwin Srinath (https://github.com/shwina) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #12644

wence- requested review from a team as code owners January 13, 2023 10:56

wence- requested review from shwina and charlesbluca January 13, 2023 10:56

github-actions bot added the Python Affects Python cuDF API. label Jan 13, 2023

wence- added tech debt non-breaking Non-breaking change labels Jan 13, 2023

wence- force-pushed the wence/feature/test-no-cuInit-via-gdb branch from c37b961 to de5bfce Compare January 13, 2023 10:58

wence- force-pushed the wence/feature/test-no-cuInit-via-gdb branch from de5bfce to ff4f5f5 Compare January 13, 2023 11:04

github-actions bot added the conda label Jan 13, 2023

wence- mentioned this pull request Jan 13, 2023

Test that RAPIDS_NO_INITIALIZE means no cuInit #12361

Closed

3 tasks

wence- force-pushed the wence/feature/test-no-cuInit-via-gdb branch from ff4f5f5 to 9da854e Compare January 13, 2023 11:06

wence- added the improvement Improvement / enhancement to an existing function label Jan 13, 2023

shwina approved these changes Jan 13, 2023

View reviewed changes

vyasr approved these changes Jan 14, 2023

View reviewed changes

wence- added 3 commits January 17, 2023 10:38

Usse cuda-gdb and take commands from stdin

a75a930

No need for gdb in conda environment

1d6b088

We'll just use cuda-gdb which already exists instead.

Merge remote-tracking branch 'upstream/branch-23.02' into wence/featu…

f3f6b50

…re/test-no-cuInit-via-gdb

github-actions bot removed the conda label Jan 17, 2023

wence- added 3 commits January 17, 2023 10:48

Fix formatting

621507f

Skip cuInit test on arm (cuda-gdb is non-functional)

1a045c1

Merge remote-tracking branch 'upstream/branch-23.02' into wence/featu…

f140123

…re/test-no-cuInit-via-gdb

ajschmidt8 removed the request for review from a team January 18, 2023 22:09

Merge remote-tracking branch 'upstream/branch-23.02' into wence/featu…

bce4fda

…re/test-no-cuInit-via-gdb

Merge remote-tracking branch 'origin/branch-23.02' into wence/feature…

34a6b3a

…/test-no-cuInit-via-gdb

rapids-bot bot merged commit 7b71c3d into rapidsai:branch-23.02 Jan 20, 2023

wence- deleted the wence/feature/test-no-cuInit-via-gdb branch January 23, 2023 11:07

wence- mentioned this pull request Jan 30, 2023

Install gdb via conda-forge in the test environment #12642

Closed

3 tasks

wence- mentioned this pull request Jan 30, 2023

tests: Skip cuInit tests if cuda-gdb is not found or not working #12644

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set #12545

Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set #12545

wence- commented Jan 13, 2023

wence- commented Jan 13, 2023

codecov bot commented Jan 13, 2023 •

edited

Loading

shwina left a comment

davidwendt commented Jan 13, 2023

vyasr left a comment

wence- commented Jan 17, 2023

wence- commented Jan 17, 2023

wence- commented Jan 17, 2023

ajschmidt8 commented Jan 18, 2023

wence- commented Jan 19, 2023

wence- commented Jan 19, 2023

Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set #12545

Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set #12545

Conversation

wence- commented Jan 13, 2023

Description

Checklist

wence- commented Jan 13, 2023

codecov bot commented Jan 13, 2023 • edited Loading

Codecov Report

shwina left a comment

Choose a reason for hiding this comment

davidwendt commented Jan 13, 2023

vyasr left a comment

Choose a reason for hiding this comment

wence- commented Jan 17, 2023

wence- commented Jan 17, 2023

wence- commented Jan 17, 2023

ajschmidt8 commented Jan 18, 2023

wence- commented Jan 19, 2023

wence- commented Jan 19, 2023

codecov bot commented Jan 13, 2023 •

edited

Loading