untraceable GPU memory allocation

**Describe the bug**

When I was testing triton inference server 19.10, GPU memory usage increases when the following two functions are called:
1. cuCtxGetCurrent
2. cuModuleGetFunction

It seems when loading cuda module, some data is transmitted into GPU memory without any function calls described within [Memory Manage](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM).

Despite the fact that any following `cuMemAlloc` call will be prevented if untraceable GPU memory allocation has already surpassed the limit set by user, it still seems a flaw that the actual GPU memory usage may exceed limit.

**Environment**
OS: Linux kube-node-zw 3.10.0-1062.18.1.el7.x86_64 # 1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

GPU Info: NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

untraceable GPU memory allocation #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

untraceable GPU memory allocation #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions