Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Docker GPU environment #4618

Merged
merged 9 commits into from
Aug 29, 2019
Merged

Docker GPU environment #4618

merged 9 commits into from
Aug 29, 2019

Conversation

mfranciszkiewicz
Copy link
Contributor

@mfranciszkiewicz mfranciszkiewicz commented Aug 16, 2019

New version of DockerGPUEnvironment, focusing on nvidia-docker-runtime support

@mfranciszkiewicz mfranciszkiewicz changed the base branch from develop to docker_cpu_env_refactor August 19, 2019 13:11
@mfranciszkiewicz mfranciszkiewicz requested review from Wiezzel and maaktweluit and removed request for Wiezzel August 19, 2019 15:14
@mfranciszkiewicz mfranciszkiewicz marked this pull request as ready for review August 19, 2019 15:16
Copy link

@Wiezzel Wiezzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance it looks fine, but I will have to get a second or even third look since this is a lot of code.

One potential issue is a hidden dependency between CPU and GPU environments. If you prepare both and the clean up just one of them, the other would be unusable (because of the VM being disabled) while still having 'enabled' status. However, this might possibly be resolved using the auto-setup wrapper (#4592) by sharing a single usage counter between two environments.

golem/task/taskcomputer.py Outdated Show resolved Hide resolved
@mfranciszkiewicz
Copy link
Contributor Author

mfranciszkiewicz commented Aug 20, 2019

There is one more general issue to address with the new environments - a "partial" support mode. I.e. requestors may not need the specialized hardware to request tasks and verify subtask results. It's not only for GPUs, TEEs are affected as well.

EDIT:
"Partial" support shouldn't be resolved on the Environment level but e.g. by the application specifying separate environments for requesting and computing in the manifest file.

Copy link
Contributor

@maaktweluit maaktweluit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Feels a bit strange for the CPU_config to be superclass of GPU_config was thinking they had a shared base.

Will approve on reply, no changes required

golem/envs/docker/gpu.py Outdated Show resolved Hide resolved
golem/envs/docker/vendor/nvidia.py Outdated Show resolved Hide resolved
tests/golem/envs/docker/gpu/test_config.py Outdated Show resolved Hide resolved
from golem.envs.docker.vendor import nvidia


# FIXME: update the test after nvgpu logic is moved out of the apps folder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an issue for this fixme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is none - I can create one. This should be resolved when apps are migrated to the Task API.

@mfranciszkiewicz
Copy link
Contributor Author

@maaktweluit DockerGPUConfig really is a superset of DockerCPUConfig - the GPU environment also needs CPU, RAM and working dirs to be configured. I see the CPU environment as a base for other environments. Both configs may share a common base, but in case of the CPU config that would only be a name change.

@maaktweluit
Copy link
Contributor

Ok, clear :D thanks for the fixes, approving!

@codecov
Copy link

codecov bot commented Aug 20, 2019

Codecov Report

Merging #4618 into develop will increase coverage by 1.12%.
The diff coverage is 90.14%.

@@             Coverage Diff             @@
##           develop    #4618      +/-   ##
===========================================
+ Coverage    89.15%   90.27%   +1.12%     
===========================================
  Files          224      226       +2     
  Lines        20208    20344     +136     
===========================================
+ Hits         18016    18366     +350     
+ Misses        2192     1978     -214

@mfranciszkiewicz mfranciszkiewicz changed the base branch from docker_cpu_env_refactor to develop August 22, 2019 10:08
golem/envs/docker/gpu.py Outdated Show resolved Hide resolved
golem/task/taskcomputer.py Outdated Show resolved Hide resolved
golem/task/taskcomputer.py Outdated Show resolved Hide resolved
golem/task/taskcomputer.py Outdated Show resolved Hide resolved
@mfranciszkiewicz mfranciszkiewicz merged commit 3440ea5 into develop Aug 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants