-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvproxy: Support GPU capability segmentation #10856
Labels
type: enhancement
New feature or request
Comments
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 4, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 5, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 6, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 671644915
copybara-service bot
pushed a commit
that referenced
this issue
Sep 6, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 671644915
copybara-service bot
pushed a commit
that referenced
this issue
Sep 9, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 671644915
copybara-service bot
pushed a commit
that referenced
this issue
Sep 9, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 9, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 670751227
copybara-service bot
pushed a commit
that referenced
this issue
Sep 9, 2024
This wraps all GPU tests' command line with the nvproxy ioctl sniffer. This has multiple functions: - Verifying that the application does not call ioctls unsupported by nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which is initially set to `true` in all tests to mirror current behavior, but should be flipped as we verify that they do not call unsupported ioctls. - Verifying that the sniffer itself works transparently for a wide range of applications. - Later down the line, enforcing that the application only calls ioctls that are part of GPU capabilities that it has a need for. This is controlled by a capability string which is currently only used to set the `NVIDIA_DRIVER_CAPABILITIES` environment variable. Updates issue #10856 PiperOrigin-RevId: 672714520
copybara-service bot
pushed a commit
that referenced
this issue
Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 671644915
copybara-service bot
pushed a commit
that referenced
this issue
Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 671644915
copybara-service bot
pushed a commit
that referenced
this issue
Sep 24, 2024
runsc attempts to emulate nvidia-container-runtime-hook. But it was always passing "--compute --utility" as driver capability flags to `nvidia-container-cli configure` command. Fix runsc to emulate nvidia-container-runtime-hook correctly by parsing NVIDIA_DRIVER_CAPABILITIES and converting that comma-separated list to flags. This is in preparation for adding support for non-compute GPU workloads in nvproxy :) Updates #9452 Updates #10856 PiperOrigin-RevId: 678064565
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Currently, gVisor's NVIDIA GPU support feature (
nvproxy
) only supports CUDA-related commands (ioctl
s, allocation classes, etc.). There have been multiple requests to expand this set to support non-CUDA GPU workloads, such as video transcoding (NVENC, NVDEC) in #9452. Vulkan has also come up.One aspect of
nvproxy
's design is that it inherently limits the exposed NVIDIA kernel driver ABI to the set of commands thatnvproxy
understands. Like all attack-surface-reduction measures, doing so offers some security benefits.If we continue to add commands to
nvproxy
under the same big bag of commands it currently knows about, this will weaken this benefit over time. This has been fine so far because the only workloadsnvproxy
has aimed to support were all of the same type (compute/CUDA-type workloads), and thus can be reasonably expected to require a largely-overlapping set of commands as each other. However, by adding support for e.g. video transcoding workloads, adding them to this existing set would expose video-transcoding ABI commands to CUDA workloads that do not need them. This feature request is about avoiding that.Is this feature related to a specific bug?
#9452 and other discussions.
Do you have a specific solution in mind?
This feature request is about implementing a capability segmentation scheme to
nvproxy
commands. This way, all commands that are not required by CUDA workloads would not be exposed unless explicitly requested.NVIDIA has the concept of "driver capabilities", which map to shared libraries (
.so
files) that roughly correspond to the set of high-level functions that users of each capability would need. They are:nvidia-smi
and NVML.NVIDIA exposes the choice of these GPU capabilities using the
NVIDIA_DRIVER_CAPABILITIES
environment variable, similar to theNVIDIA_VISIBLE_DEVICES
environment variable.We can reuse this scheme, as it is already out there and fairly easy to understand (i.e. easy for users to specify) while still providing significant ability to keep large amounts of the kernel driver ABI unexposed.
The text was updated successfully, but these errors were encountered: