You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the deepquestai/deepstack:gpu-2022.01.1 container to do custom training. It comes with torch for cuda 11.3 but train.py fails after initiation (see error below). This is resolved when I downgrade to torch for cuda 11.0 (pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html as per the collab notebook).
docker run --gpus all -it --rm -v /home/eouser/deepstack:/deepstack/code -w /deepstack/code/deepstack-trainer deepquestai/deepstack_updated:gpu python3 train.py --dataset-path /deepstack/code/data
Traceback (most recent call last):
File "train.py", line 530, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 90, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc).to(device) # create
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 96, in init
self._initialize_biases() # only run once
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 151, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
I first need to downgrade setuptools inside the container, btw, because otherwise it throws:
Traceback (most recent call last):
File "train.py", line 21, in
from torch.utils.tensorboard import SummaryWriter
File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'setuptools._distutils' has no attribute 'version'
(resolved with: pip install setuptools==59.5.0)
I am now happily training with the revised setup, so nothing too urgent, but maybe worth checking out.
Thx for this wonderful framework!
Guido
The text was updated successfully, but these errors were encountered:
Hi,
I am using the deepquestai/deepstack:gpu-2022.01.1 container to do custom training. It comes with torch for cuda 11.3 but train.py fails after initiation (see error below). This is resolved when I downgrade to torch for cuda 11.0 (
pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
as per the collab notebook).docker run --gpus all -it --rm -v /home/eouser/deepstack:/deepstack/code -w /deepstack/code/deepstack-trainer deepquestai/deepstack_updated:gpu python3 train.py --dataset-path /deepstack/code/data
Traceback (most recent call last):
File "train.py", line 530, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 90, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc).to(device) # create
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 96, in init
self._initialize_biases() # only run once
File "/deepstack/code/deepstack-trainer/models/yolo.py", line 151, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
I first need to downgrade setuptools inside the container, btw, because otherwise it throws:
Traceback (most recent call last):
File "train.py", line 21, in
from torch.utils.tensorboard import SummaryWriter
File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'setuptools._distutils' has no attribute 'version'
(resolved with:
pip install setuptools==59.5.0
)I am now happily training with the revised setup, so nothing too urgent, but maybe worth checking out.
Thx for this wonderful framework!
Guido
The text was updated successfully, but these errors were encountered: