Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
fix 'check_for_gpu' (#5522)
Browse files Browse the repository at this point in the history
  • Loading branch information
epwalsh authored Dec 24, 2021
1 parent 06ec7f9 commit 71f2d79
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 14 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `FBetaMultiLabelMeasure` now works with multiple dimensions
- Support for inferior operating systems when making hardlinks
- Use `,` as a separator for filenames in the `evaluate` command, thus allowing for URLs (eg. `gs://...`) as input files.
- Removed a spurious error message "'torch.cuda' has no attribute '_check_driver'" that would be appear in the logs
when a `ConfigurationError` for missing GPU was raised.
- Load model on CPU post training to save GPU memory.

### Removed
Expand Down
16 changes: 2 additions & 14 deletions allennlp/common/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging
import re
import subprocess
from typing import List, Union, Tuple, Any
from typing import Any, List, Tuple, Union

import torch
from torch import cuda
Expand Down Expand Up @@ -114,22 +114,10 @@ def check_for_gpu(device: Union[int, torch.device, List[Union[int, torch.device]
if device != torch.device("cpu"):
num_devices_available = cuda.device_count()
if num_devices_available == 0:
# Torch will give a more informative exception than ours, so we want to include
# that context as well if it's available. For example, if you try to run torch 1.5
# on a machine with CUDA10.1 you'll get the following:
#
# The NVIDIA driver on your system is too old (found version 10010).
#
torch_gpu_error = ""
try:
cuda._check_driver()
except Exception as e:
torch_gpu_error = "\n{0}".format(e)

raise ConfigurationError(
"Experiment specified a GPU but none is available;"
" if you want to run on CPU use the override"
" 'trainer.cuda_device=-1' in the json config file." + torch_gpu_error
" 'trainer.cuda_device=-1' in the json config file."
)
elif device.index >= num_devices_available:
raise ConfigurationError(
Expand Down

0 comments on commit 71f2d79

Please sign in to comment.