Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Large memory requirements for SimpleImputer strategy median #4794

Closed
erikrene opened this issue Jul 1, 2022 · 3 comments
Closed

[BUG] Large memory requirements for SimpleImputer strategy median #4794

erikrene opened this issue Jul 1, 2022 · 3 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working inactive-30d

Comments

@erikrene
Copy link
Contributor

erikrene commented Jul 1, 2022

Describe the bug
Running fit_transform with SimpleImputer results in running out of memory. This only occurs when the imputation strategy is median.

Steps/Code to reproduce bug

import cudf
import cupy as cp
from cuml.preprocessing import SimpleImputer

data = cp.zeros([270000000, 1], dtype=float)

imputer = SimpleImputer(missing_values=cp.nan, strategy='median')
data = imputer.fit_transform(data)

Output:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
File cupy/cuda/memory.pyx:698, in cupy.cuda.memory.alloc()

File ~/miniconda3/envs/triton_example_2204/lib/python3.8/site-packages/rmm/rmm.py:212, in rmm_cupy_allocator(nbytes)
    209     raise ModuleNotFoundError("No module named 'cupy'")
    211 stream = Stream(obj=cupy.cuda.get_current_stream())
--> 212 buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
    213 dev_id = -1 if buf.ptr else cupy.cuda.device.get_device_id()
    214 mem = cupy.cuda.UnownedMemory(
    215     ptr=buf.ptr, size=buf.size, owner=buf, device_id=dev_id
    216 )

File device_buffer.pyx:88, in rmm._lib.device_buffer.DeviceBuffer.__cinit__()

MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /home/nfs/enarvades/miniconda3/envs/triton_example_2204/include/rmm/mr/device/cuda_memory_resource.hpp

Expected behavior
This operation should not take as much memory as it does. Using a smaller array does not result in the error. After running nvidia-smi, the memory usage increases about 5x after running the code above.

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-110-generic x86_64)
  • GPU Model/Driver: Tesla T4 / 495.29.05
  • CUDA: 11.5
  • Method of cuDF & cuML install: conda
  Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             4.5                       1_gnu
argon2-cffi               21.3.0             pyhd3eb1b0_0
argon2-cffi-bindings      21.2.0           py38h7f8727e_0
asttokens                 2.0.5              pyhd3eb1b0_0
attrs                     21.4.0             pyhd3eb1b0_0
backcall                  0.2.0              pyhd3eb1b0_0
beautifulsoup4            4.11.1           py38h06a4308_0
bleach                    4.1.0              pyhd3eb1b0_0
brotlipy                  0.7.0           py38h27cfd23_1003
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.4.26            h06a4308_0
certifi                   2022.6.15        py38h06a4308_0
cffi                      1.15.0           py38hd667e15_1
charset-normalizer        2.0.4              pyhd3eb1b0_0
colorama                  0.4.4              pyhd3eb1b0_0
conda                     4.12.0           py38h06a4308_0
conda-content-trust       0.1.1              pyhd3eb1b0_0
conda-package-handling    1.8.1            py38h7f8727e_0
cryptography              36.0.0           py38h9ce1e76_0
debugpy                   1.5.1            py38h295c915_0
decorator                 5.1.1              pyhd3eb1b0_0
defusedxml                0.7.1              pyhd3eb1b0_0
entrypoints               0.4              py38h06a4308_0
executing                 0.8.3              pyhd3eb1b0_0
icu                       58.2              hf484d3e_1000    conda-forge
idna                      3.3                pyhd3eb1b0_0
importlib_resources       5.2.0              pyhd3eb1b0_1
ipykernel                 6.9.1            py38h06a4308_0
ipython                   8.3.0            py38h06a4308_0
ipython_genutils          0.2.0              pyhd3eb1b0_1
jedi                      0.18.1           py38h06a4308_1
jinja2                    3.0.3              pyhd3eb1b0_0
jsonschema                4.4.0            py38h06a4308_0
jupyter_client            7.2.2            py38h06a4308_0
jupyter_core              4.10.0           py38h06a4308_0
jupyterlab_pygments       0.1.2                      py_0
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
ld_impl_linux-64          2.35.1               h7274673_9
libarchive                3.5.2                hccf745f_1    conda-forge
libcurl                   7.82.0               h0b77cf5_0
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  he6710b0_2
libgcc-ng                 11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libiconv                  1.16                 h516909a_0    conda-forge
libnghttp2                1.46.0               hce63b2e_0
libsodium                 1.0.18               h7b6447c_0
libsolv                   0.7.20               h4ff587b_0
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libxml2                   2.9.14               h74e7548_0
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     0.15.3           py38h2aa5da1_0    conda-forge
markupsafe                2.1.1            py38h7f8727e_0
matplotlib-inline         0.1.2              pyhd3eb1b0_2
mistune                   0.8.4           py38h7b6447c_1000
nb_conda_kernels          2.3.1            py38h06a4308_0
nbclient                  0.5.13           py38h06a4308_0
nbconvert                 6.4.4            py38h06a4308_0
nbformat                  5.3.0            py38h06a4308_0
ncurses                   6.3                  h7f8727e_2
nest-asyncio              1.5.5            py38h06a4308_0
notebook                  6.4.11           py38h06a4308_0
openssl                   1.1.1o               h7f8727e_0
packaging                 21.3               pyhd3eb1b0_0
pandocfilters             1.5.0              pyhd3eb1b0_0
parso                     0.8.3              pyhd3eb1b0_0
pexpect                   4.8.0              pyhd3eb1b0_3
pickleshare               0.7.5           pyhd3eb1b0_1003
pip                       21.2.4           py38h06a4308_0
prometheus_client         0.13.1             pyhd3eb1b0_0
prompt-toolkit            3.0.20             pyhd3eb1b0_0
ptyprocess                0.7.0              pyhd3eb1b0_2
pure_eval                 0.2.2              pyhd3eb1b0_0
pycosat                   0.6.3            py38h7b6447c_1
pycparser                 2.21               pyhd3eb1b0_0
pygments                  2.11.2             pyhd3eb1b0_0
pyopenssl                 22.0.0             pyhd3eb1b0_0
pyparsing                 3.0.4              pyhd3eb1b0_0
pyrsistent                0.18.0           py38heee7806_0
pysocks                   1.7.1            py38h06a4308_0
python                    3.8.13               h12debd9_0
python-dateutil           2.8.2              pyhd3eb1b0_0
python-fastjsonschema     2.15.1             pyhd3eb1b0_0
python_abi                3.8                      2_cp38    conda-forge
pyzmq                     22.3.0           py38h295c915_2
readline                  8.1.2                h7f8727e_1
reproc                    14.2.3               h7f98852_0    conda-forge
reproc-cpp                14.2.3               h9c3ff4c_0    conda-forge
requests                  2.27.1             pyhd3eb1b0_0
ruamel_yaml               0.15.100         py38h27cfd23_0
send2trash                1.8.0              pyhd3eb1b0_1
setuptools                61.2.0           py38h06a4308_0
six                       1.16.0             pyhd3eb1b0_1
soupsieve                 2.3.1              pyhd3eb1b0_0
sqlite                    3.38.2               hc218d9a_0
stack_data                0.2.0              pyhd3eb1b0_0
terminado                 0.13.1           py38h06a4308_0
testpath                  0.6.0            py38h06a4308_0
tk                        8.6.11               h1ccaba5_0
tornado                   6.1              py38h27cfd23_0
tqdm                      4.63.0             pyhd3eb1b0_0
traitlets                 5.1.1              pyhd3eb1b0_0
typing-extensions         4.1.1                hd3eb1b0_0
typing_extensions         4.1.1              pyh06a4308_0
urllib3                   1.26.8             pyhd3eb1b0_0
wcwidth                   0.2.5              pyhd3eb1b0_0
webencodings              0.5.1                    py38_1
wheel                     0.37.1             pyhd3eb1b0_0
xz                        5.2.5                h7b6447c_0
yaml                      0.2.5                h7b6447c_0
zeromq                    4.3.4                h2531618_0
zipp                      3.8.0            py38h06a4308_0
zlib                      1.2.12               h7f8727e_1
zstd                      1.5.2                ha4553b6_0
@erikrene erikrene added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jul 1, 2022
@beckernick beckernick changed the title [BUG] [BUG] Large memory requirements for SimpleImputer strategy median Jul 11, 2022
@beckernick
Copy link
Member

beckernick commented Jul 11, 2022

Thanks for filing an issue. I observe the example above (~2.16GB array) spiking memory to ~18GB on my machine in 22.06.

As a note, you can edit your issues if they are accidentally filed without a title or for any other reason. I've updated this issue to better reflect the behavior, which helps us evaluate.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Aug 11, 2022
… (#4817)

I have implemented a fix for [BUG] Large memory requirements for SimpleImputer strategy median #4794. I narrowed down the issue to _masked_column_median. As expected, the extra memory results from the unnecessary copy of the array (in the case where NaN is the masked value). However, in the other case (where NaN isn't the masked value) this copy is necessary. To fix this, I used in-place sorting. However, in both cases the memory usage goes from 3000 MiB (size of original array) to 13000. From my understanding, sorting should only take up an additional 3000 MiB. Is it possible to reduce memory usage further? Still, this fix still reduces the memory used by over 5000 MiB.

Authors:
  - https://github.com/erikrene

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: #4817
jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
…idsai#4794 (rapidsai#4817)

I have implemented a fix for [BUG] Large memory requirements for SimpleImputer strategy median rapidsai#4794. I narrowed down the issue to _masked_column_median. As expected, the extra memory results from the unnecessary copy of the array (in the case where NaN is the masked value). However, in the other case (where NaN isn't the masked value) this copy is necessary. To fix this, I used in-place sorting. However, in both cases the memory usage goes from 3000 MiB (size of original array) to 13000. From my understanding, sorting should only take up an additional 3000 MiB. Is it possible to reduce memory usage further? Still, this fix still reduces the memory used by over 5000 MiB.

Authors:
  - https://github.com/erikrene

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4817
@beckernick
Copy link
Member

beckernick commented Apr 14, 2023

This was resolved by #4817 . We now generally require less memory than the CPU scikit-learn version. Closing.

%load_ext gpu_memory_profiler
%load_ext memory_profiler

from sklearn.impute import SimpleImputer as skl_SimpleImputer
from cuml.preprocessing import SimpleImputer as cu_SimpleImputer
from sklearn.datasets import make_classification
import numpy as np
import cupy as cp
import gc

NROWS = [
    64e6,
    128e6,
    256e6,
]
NROWS = [int(x) for x in NROWS]

NULL_PCT = [
    0.1,
]


for N in NROWS:
    for NP in NULL_PCT:
        # Create some data and randomly set some elements as null
        X = np.random.normal(0, 10, size=(N, 1))
        mask = np.random.choice([True, False], size=X.shape, p=[NP, 1-NP])
        X[mask] = None
        
        # Compare peak memory usage on GPU and CPU
        print(f"{N:,} rows, {NP} null percent, X size: {X.nbytes/1e9} GB")
        imputer = cu_SimpleImputer(strategy='median')
        %gpu_memit imputer.fit(X)
        imputer = skl_SimpleImputer(strategy='median')
        %memit imputer.fit(X)
        print()
        
        del X
        gc.collect()
64,000,000 rows, 0.1 null percent, X size: 0.512 GB
Peak GPU memory: 4225.00 MiB
peak memory: 6078.40 MiB, increment: 2136.23 MiB

128,000,000 rows, 0.1 null percent, X size: 1.024 GB
Peak GPU memory: 7219.00 MiB
peak memory: 9787.75 MiB, increment: 4272.26 MiB

256,000,000 rows, 0.1 null percent, X size: 2.048 GB
Peak GPU memory: 13203.00 MiB
peak memory: 17206.98 MiB, increment: 8544.85 MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working inactive-30d
Projects
None yet
Development

No branches or pull requests

2 participants