Skip to content

untraceable GPU memory allocation #5

Open
@zw0610

Description

@zw0610

Describe the bug

When I was testing triton inference server 19.10, GPU memory usage increases when the following two functions are called:

  1. cuCtxGetCurrent
  2. cuModuleGetFunction

It seems when loading cuda module, some data is transmitted into GPU memory without any function calls described within Memory Manage.

Despite the fact that any following cuMemAlloc call will be prevented if untraceable GPU memory allocation has already surpassed the limit set by user, it still seems a flaw that the actual GPU memory usage may exceed limit.

Environment
OS: Linux kube-node-zw 3.10.0-1062.18.1.el7.x86_64 # 1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

GPU Info: NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions