You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When searching for an interesting image retrieval idea, I meet your project. Your project was wonderful, I tried to test your model and it works like charm!
However, the problem came up when I try to train with the Sketchy dataset. Base on your instruction in README, I tried to train with the following command: >>> python3 train.py --dataset Sketchy --dim-out 64 --semantic-models word2vec-google-news --epochs 1 --early-stop 10 --lr 0.0001
Here, I got a weird problem, it announce me with the below message: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [288, 64]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient
I tried to fix it myself, but it didn't work. Can you help me with this problem? Thank you so much!!!
My workspace is Colab, with Pytorch 1.10.0+cu111.
The detailed error message (with torch.autograd.set_detect_anomaly(True)):
Parameters: Namespace(batch_size=128, dataset='Sketchy', dim_out=64, early_stop=10, epoch_size=100, epochs=1, filter_sketch=False, gamma=0.1, gzs_sbir=False, im_sz=224, lambda_disc_im=0.5, lambda_disc_se=0.25, lambda_disc_sk=0.5, lambda_gen_adv=1.0, lambda_gen_cls=1.0, lambda_gen_cyc=1.0, lambda_gen_reg=0.1, lambda_im=10.0, lambda_regular=0.001, lambda_se=10.0, lambda_sk=10.0, log_interval=1, lr=0.0001, milestones=[], momentum=0.9, ngpu=1, num_workers=4, number_qualit_results=200, save_best_results=False, save_image_results=False, semantic_models=['word2vec-google-news'], sk_sz=224, split_eccv_2018=False, test=False)
Checkpoint path: /content/drive/MyDrive/sem-pcyc/auxs/CheckPoints/Sketchy/word2vec-google-news/64
Logger path: /content/drive/MyDrive/sem-pcyc/auxs/LogFiles/Sketchy/word2vec-google-news/64
Result path: /content/drive/MyDrive/sem-pcyc/auxs/Results/Sketchy/word2vec-google-news/64
Loading data...Done
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Initializing model variables...Done
Initializing trainable models...Done
Defining optimizers...Done
Defining losses...Done
Initializing variables...Done
Setting logger...Done
Checking cuda...*Cuda exists*...Done
***Train***
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[W python_anomaly_mode.cpp:104] Warning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
File "src/train.py", line 358, in <module>
main()
File "src/train.py", line 230, in main
losses = train(train_loader, sem_pcyc_model, epoch, args)
File "src/train.py", line 323, in train
loss = sem_pcyc_model.optimize_params(sk, im, cl)
File "/content/drive/My Drive/sem-pcyc/src/models.py", line 368, in optimize_params
self.forward(sk, im, se)
File "/content/drive/My Drive/sem-pcyc/src/models.py", line 259, in forward
self.sk2se_em = self.gen_sk2se(self.sk_fe)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/My Drive/sem-pcyc/src/models.py", line 64, in forward
return self.gen(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
(function _print_stack)
Traceback (most recent call last):
File "src/train.py", line 358, in <module>
main()
File "src/train.py", line 230, in main
losses = train(train_loader, sem_pcyc_model, epoch, args)
File "src/train.py", line 323, in train
loss = sem_pcyc_model.optimize_params(sk, im, cl)
File "/content/drive/My Drive/sem-pcyc/src/models.py", line 371, in optimize_params
loss = self.backward(se, num_cls)
File "/content/drive/My Drive/sem-pcyc/src/models.py", line 325, in backward
loss_disc_se.backward(retain_graph=True)
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [288, 64]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The text was updated successfully, but these errors were encountered:
If anyone is still struggling with this issue, the cause of it seems to be the optimizer: optimizer.step uses inplace operations, so when it is called for one loss before .backward() is called for another loss that shares some models, it causes the error (see pytorch/pytorch#39141). Alternatively, all the steps should happen after .backward() has been called for each loss. Then you should also move all the zero_grad() calls to the beginning, so it doesn't zero the gradient for the re-used models in the middle. That being said, setting inlace=True in relu and leakyrelu actually seems to be fine
Hi sir,
When searching for an interesting image retrieval idea, I meet your project. Your project was wonderful, I tried to test your model and it works like charm!
However, the problem came up when I try to train with the Sketchy dataset. Base on your instruction in README, I tried to train with the following command:
>>> python3 train.py --dataset Sketchy --dim-out 64 --semantic-models word2vec-google-news --epochs 1 --early-stop 10 --lr 0.0001
Here, I got a weird problem, it announce me with the below message:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [288, 64]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient
I tried to fix it myself, but it didn't work. Can you help me with this problem? Thank you so much!!!
My workspace is Colab, with Pytorch 1.10.0+cu111.
The detailed error message (with
torch.autograd.set_detect_anomaly(True)
):The text was updated successfully, but these errors were encountered: