Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python can't set the device / phase for net initialization #1700

Closed
shelhamer opened this issue Jan 9, 2015 · 2 comments · Fixed by #1728
Closed

Python can't set the device / phase for net initialization #1700

shelhamer opened this issue Jan 9, 2015 · 2 comments · Fixed by #1728

Comments

@shelhamer
Copy link
Member

cuDNN handles and such are acquired at net initialization when the cuDNN layers that require these resources are constructed. Since the Python interface only exposes set_device() as a method of Net instead of a module function, it is too late to actually set the device for cuDNN computation once the net is made. All cuDNN computation from Python is run on GPU 0 for this reason, and attempts to set other devices will fail with

status == CUDNN_STATUS_SUCCESS (8 vs. 0)  CUDNN_STATUS_EXECUTION_FAILED

due to the disagreement between initialization and execution.

@longjon has the workaround for now: use the environment variable CUDA_VISIBLE_DEVICES instead of using set_device.

The simplest fix is to expose set_device() and set_phase from the caffe module itself as functions. This is only a bandaid and changes the interface.

Of course The Right Idea is to make Net responsible for device and phase, set them at initialization, and never switch #1500... but that involves a few details.

@shelhamer shelhamer changed the title Python can't set the device for cuDNN layers Python can't set the device / phase for net initialization Jan 15, 2015
shelhamer added a commit to shelhamer/caffe that referenced this issue Jan 15, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.
shelhamer added a commit to shelhamer/caffe that referenced this issue Jan 15, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.
shelhamer added a commit to shelhamer/caffe that referenced this issue Jan 15, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.

Update examples to new usage.
shelhamer added a commit to shelhamer/caffe that referenced this issue Jan 15, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.

Update examples to new usage.
shelhamer added a commit to shelhamer/caffe that referenced this issue Jan 15, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.

Update examples to new usage.
shelhamer added a commit that referenced this issue Jan 28, 2015
Attach mode, phase, and device setters to caffe module itself
so that these can be set before making nets. This is needed to properly
initialize layers with the right device and phase configuration.

Update examples to new usage.
@alfredox10
Copy link

How do I make this modification for DIGITS? I am getting that same error there after the first epoch.

@lukeyeager
Copy link
Contributor

@alfredox10, can you open a new issue against DIGITS? I bet your issue is unrelated to this one.
https://github.com/NVIDIA/DIGITS/issues/new

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants