You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Usually this error is caused by the incompatibility between your cuda version used to compile pytorch and your GPU. A6000 may not support torch with cu10.1, try installing torch version with cu11.x
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
There was an error while reproducing the code on the machine with the following spec:
Ubuntu: 20.04
GPU: Nvidia A6000
python version: 3.8.0
pip list:
nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0
Didn't change much of the config for cityscapes to foggy cityscapes.
Error message:
[06/13 22:42:17 d2.engine.defaults]: Model:
DAobjTwoStagePseudoLabGeneralizedRCNN(
(backbone): vgg_backbone(
(vgg0): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg1): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg2): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg3): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(vgg4): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)
(proposal_generator): PseudoLabRPN(
(rpn_head): StandardRPNHead(
(conv): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
(activation): ReLU()
)
(objectness_logits): Conv2d(512, 15, kernel_size=(1, 1), stride=(1, 1))
(anchor_deltas): Conv2d(512, 60, kernel_size=(1, 1), stride=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
(roi_heads): StandardROIHeadsPseudoLab(
(box_pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
)
)
(box_head): FastRCNNConvFCHead(
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=25088, out_features=1024, bias=True)
(fc_relu1): ReLU()
(fc2): Linear(in_features=1024, out_features=1024, bias=True)
(fc_relu2): ReLU()
)
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=1024, out_features=9, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=32, bias=True)
)
)
(D_img): FCDiscriminator_img(
(conv1): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(classifier): Conv2d(128, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(leaky_relu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
[06/13 22:42:17 fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch
Exception during training:
Traceback (most recent call last):
File "/four_tb/manjunath/adaptive_teacher/adapteacher/engine/trainer.py", line 404, in train_loop
self.run_step_full_semisup()
File "/four_tb/manjunath/adaptive_teacher/adapteacher/engine/trainer.py", line 512, in run_step_full_semisup
record_dict, _, _, _ = self.model(
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/four_tb/manjunath/adaptive_teacher/adapteacher/modeling/meta_arch/rcnn.py", line 207, in forward
images = self.preprocess_image(batched_inputs)
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 225, in preprocess_image
images = [(x - self.pixel_mean) / self.pixel_std for x in images]
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 225, in
images = [(x - self.pixel_mean) / self.pixel_std for x in images]
RuntimeError: CUDA error: no kernel image is available for execution on the device
[06/13 22:42:18 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[06/13 22:42:18 d2.utils.events]: iter: 0 lr: N/A max_mem: 368M
Traceback (most recent call last):
File "train_net.py", line 73, in
launch(
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 66, in main
return trainer.train()
File "/four_tb/manjunath/adaptive_teacher/adapteacher/engine/trainer.py", line 386, in train
self.train_loop(self.start_iter, self.max_iter)
File "/four_tb/manjunath/adaptive_teacher/adapteacher/engine/trainer.py", line 404, in train_loop
self.run_step_full_semisup()
File "/four_tb/manjunath/adaptive_teacher/adapteacher/engine/trainer.py", line 512, in run_step_full_semisup
record_dict, _, _, _ = self.model(
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/four_tb/manjunath/adaptive_teacher/adapteacher/modeling/meta_arch/rcnn.py", line 207, in forward
images = self.preprocess_image(batched_inputs)
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 225, in preprocess_image
images = [(x - self.pixel_mean) / self.pixel_std for x in images]
File "/home/user/anaconda3/envs/fbadapt/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 225, in
images = [(x - self.pixel_mean) / self.pixel_std for x in images]
RuntimeError: CUDA error: no kernel image is available for execution on the device
Thought it was cuda version issue, but running the same on docker container (cuda version 10.1) on same machine gave the same error
The text was updated successfully, but these errors were encountered: