Open
Description
Describe the bug
When I was testing triton inference server 19.10, GPU memory usage increases when the following two functions are called:
- cuCtxGetCurrent
- cuModuleGetFunction
It seems when loading cuda module, some data is transmitted into GPU memory without any function calls described within Memory Manage.
Despite the fact that any following cuMemAlloc
call will be prevented if untraceable GPU memory allocation has already surpassed the limit set by user, it still seems a flaw that the actual GPU memory usage may exceed limit.
Environment
OS: Linux kube-node-zw 3.10.0-1062.18.1.el7.x86_64 # 1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
GPU Info: NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2
Metadata
Metadata
Assignees
Labels
No labels