Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support nvidia-container 2.0 to enable GPU access #2029

Merged
merged 1 commit into from
Aug 10, 2018

Conversation

CodeJuan
Copy link
Contributor

@CodeJuan CodeJuan commented Jul 31, 2018

Ⅰ. Describe what this PR did

Make pouch natively support GPU devices

Ⅱ. Does this pull request fix one issue?

fixes #824 #2033

Ⅲ. Describe how you did it

Ⅳ. Describe how to verify it

  1. Prerequisites:
  • An instance with a NVIDIA GPU
  • The appropriate GPU driver is installed
  1. Create container with NvidiaConfig.
curl --unix-socket /run/pouchd.sock -X POST http:/containers/create -H "Content-Type: application/json" -d '{"Image": "centos:7", "HostConfig":{"Devices": [],"NvidiaConfig":{"NvidiaDriverCapabilities":"all","NvidiaVisibleDevices":"0"}},"Cmd": ["ping","127.0.0.1"]}'
  1. Start container.
  2. Exec nvidia-smi in container.
  3. If nvidia-smi return error NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system, then add a link to libnvidia-ml.so
ln -s  /usr/lib64/libnvidia-cfg.so.1 /usr/lib64/libnvidia-cfg.so
ln -s /usr/lib64/libnvidia-cfg.so.387.26 /usr/lib64/libnvidia-cfg.so.1
ln -s  /usr/lib64/libnvidia-ml.so.1 /usr/lib64/libnvidia-ml.so
ln -s /usr/lib64/libnvidia-ml.so.387.26 /usr/lib64/libnvidia-ml.so.1

Ⅴ. Special notes for reviews

@allencloud allencloud changed the title support nvidia-container 2.0 feature: support nvidia-container 2.0 Aug 1, 2018
@codecov-io
Copy link

codecov-io commented Aug 1, 2018

Codecov Report

Merging #2029 into master will increase coverage by 5.27%.
The diff coverage is 62.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2029      +/-   ##
==========================================
+ Coverage   58.64%   63.91%   +5.27%     
==========================================
  Files         206      206              
  Lines       15840    15902      +62     
==========================================
+ Hits         9289    10164     +875     
+ Misses       5413     4468     -945     
- Partials     1138     1270     +132
Flag Coverage Δ
#criv1alpha1test 33.17% <17.74%> (-0.03%) ⬇️
#criv1alpha2test 33.77% <17.74%> (?)
#integrationtest 37.93% <35.48%> (-0.01%) ⬇️
#unittest 23.84% <48.38%> (+0.11%) ⬆️
Impacted Files Coverage Δ
daemon/mgr/spec_hook.go 24.56% <26.66%> (+0.75%) ⬆️
daemon/mgr/spec_process.go 65.65% <44.44%> (-2.13%) ⬇️
daemon/mgr/container_validation.go 43.13% <81.57%> (+12.7%) ⬆️
daemon/logger/jsonfile/utils.go 71.54% <0%> (-1.63%) ⬇️
ctrd/image.go 77.29% <0%> (+2.18%) ⬆️
cri/stream/runtime.go 68.23% <0%> (+5.88%) ⬆️
cri/stream/remotecommand/httpstream.go 44.04% <0%> (+9.32%) ⬆️
cri/criservice.go 64.7% <0%> (+24.99%) ⬆️
cri/v1alpha2/cri_utils.go 82.75% <0%> (+25.94%) ⬆️
cri/v1alpha2/cri_wrapper.go 53.87% <0%> (+53.87%) ⬆️
... and 5 more

return nil
}

if err := vlidateNvidiaDriver(r); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/validate/vlidate?

return nil
}

func stringInSlice(a string, list []string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to add this function, this has been covered in utils.StringInSlice.

Path: path,
Args: append(args, "prestart"),
}
spec.s.Hooks.Prestart = append(spec.s.Hooks.Prestart, nvidiaPrestart)
Copy link
Collaborator

@allencloud allencloud Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already have a prestart hook, will the nvidia prestart take effect here? Just double check this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, prestart commands are executed after the container namespaces are created.

@allencloud
Copy link
Collaborator

allencloud commented Aug 1, 2018

How to use this? Do we need to support this in pouch CLI?
I only see this is supported in API daemon side. @CodeJuan

@CodeJuan CodeJuan force-pushed the nvidia2 branch 2 times, most recently from d7d928c to 36d9682 Compare August 2, 2018 02:20
@CodeJuan
Copy link
Contributor Author

CodeJuan commented Aug 2, 2018

@allencloud
#2033
I will do it by next Monday.

@@ -5,14 +5,14 @@ import (
"strconv"
"strings"

"github.com/opencontainers/selinux/go-selinux/label"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the third party to next to the pouch group.


"github.com/alibaba/pouch/apis/types"
"github.com/alibaba/pouch/daemon/logger/syslog"
"github.com/alibaba/pouch/pkg/system"

"github.com/sirupsen/logrus"
"github.com/alibaba/pouch/pkg/utils"
Copy link
Contributor

@fuweid fuweid Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in pouch, we use three groups to manage the packages like:

	"os"
	"strconv"
	"strings"

 	"github.com/alibaba/pouch/apis/types"
	"github.com/alibaba/pouch/daemon/logger/syslog"
	"github.com/alibaba/pouch/pkg/system"
	"github.com/alibaba/pouch/pkg/utils"

 	"github.com/sirupsen/logrus"

Copy link
Contributor

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add UT for the validation part.

return nil
}

supportedDrivers := []string{"compute", "compat32", "graphics", "utility", "video", "display"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use map global var to hold the supportedDrivers, like

var supportedNvidiaDrivers = map[string]struct{}{
        "compute": struct{},
        ....
}

therefore, we don't need to allocate new memory in the utils.StringInSlice.

@CodeJuan CodeJuan force-pushed the nvidia2 branch 2 times, most recently from 2b3ef3f to 05ebcbe Compare August 8, 2018 11:43
Copy link
Contributor

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will merge after rebase the master

@@ -69,6 +74,24 @@ function build_pouch()
popd
}

# install nvidia-container-runtime
function build_nvidia_runtime(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next time, I will rewrite the package folder

Signed-off-by: codejuan <xh@decbug.com>
@allencloud
Copy link
Collaborator

Thanks for your great work. I will merge this. @CodeJuan

@allencloud allencloud changed the title feature: support nvidia-container 2.0 feature: support nvidia-container 2.0 to enable GPU access Aug 10, 2018
@allencloud allencloud merged commit a77e0da into AliyunContainerService:master Aug 10, 2018
@allencloud
Copy link
Collaborator

Do we have a document to help users use GPU with pouchcontainer? @CodeJuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] nvidia GPU native support in runc of Pouch
5 participants