RuntimeError: one of the variables needed for gradient computation has been modified #6

sjjbsj · 2023-06-01T15:11:04Z

Hi,

I am writing to seek your advice on an issue I am experiencing during backpropagation of my model. Specifically, I am encountering an error in the loss function after the warmup stage and am unsure how to proceed. Maybe enter if config.model.train_discriminator and epoch > config.lr_scheduler.warmup_epoch:

I would greatly appreciate any guidance or suggestions you may have to help me address this problem.

log：
Error executing job with overrides: ['distributed.torch_distributed_debug=False', 'distributed.find_unused_parameters=True', 'distributed.world_size=2', 'common.max_epoch=15', 'datasets.tensor_cut=8000', 'datasets.batch_size=40', 'datasets.train_csv_path=/home/anna.peng/PycharmProjects/encodec-pytorch-main/librispeech_train100h_anna.csv', 'lr_scheduler.warmup_epoch=2', 'optimization.lr=1e-4', 'optimization.disc_lr=1e-4']
Traceback (most recent call last):
File "train_multi_gpu.py", line 258, in main
join=True
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 209, in train
scheduler,disc_scheduler)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 59, in train_one_step
loss.backward()
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward
self, gradient, retain_graph, create_graph, inputs=inputs
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 199, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The text was updated successfully, but these errors were encountered:

ZhikangNiu · 2023-06-01T15:20:06Z

emmm, I also have encountered this problem. Maybe you can check the code is lastest? besides, I will check the problem tomorrow. you can also use old version by check the commit history. The problem maybe caused by my recent changes about scheduler.

ZhikangNiu · 2023-06-01T15:24:20Z

can you share your environment？

ZhikangNiu · 2023-06-01T15:28:49Z

I think you can debug step by step to understand the specific problem

ZhikangNiu · 2023-06-02T05:26:34Z

I have used latest code in two RTX 3090 to test the multi GPU training and one RTX 3090 to test the single GPU training. And I'm not encounter this bug. Here are my code setup.

Multi GPU training
3090 *2 , torch= 2.0.0 , the changed config parameters is listed as follows:

 distributed.torch_distributed_debug=False
 distributed.find_unused_parameters=True
 distributed.world_size=2
 common.save_interval=5
 common.test_interval=2
 common.max_epoch=100
 datasets.fixed_length=1000
 datasets.train_csv_path=/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/libritts_train100h.csv
 datasets.tensor_cut=100000
 datasets.batch_size=6
 lr_scheduler.warmup_epoch=2
 optimization.lr=1e-5
 optimization.disc_lr=1e-5

the log :

2023-06-02 12:48:15,326: INFO: [train_multi_gpu.py: 119]: {'common': {'save_interval': 5, 'test_interval': 2, 'max_epoch': 100, 'seed': 3401, 'amp': False}, 'datasets': {'train_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/libritts_train100h.csv', 'test_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/LibriTTS_dev-other.csv', 'batch_size': 6, 'tensor_cut': 100000, 'num_workers': 0, 'fixed_length': 1000, 'pin_memory': True}, 'checkpoint': {'resume': False, 'checkpoint_path': '', 'disc_checkpoint_path': '', 'save_folder': './checkpoints/', 'save_location': '${checkpoint.save_folder}batch${datasets.batch_size}_cut${datasets.tensor_cut}_length${datasets.fixed_length}_'}, 'optimization': {'lr': 1e-05, 'disc_lr': 1e-05}, 'lr_scheduler': {'warmup_epoch': 2}, 'model': {'target_bandwidths': [1.5, 3.0, 6.0, 12.0, 24.0], 'sample_rate': 24000, 'channels': 1, 'train_discriminator': True, 'audio_normalize': True, 'filters': 32}, 'distributed': {'data_parallel': True, 'world_size': 2, 'find_unused_parameters': True, 'torch_distributed_debug': False}}
2023-06-02 12:48:15,331: INFO: [train_multi_gpu.py: 120]: Encodec Model Parameters: 14855843
2023-06-02 12:48:15,331: INFO: [train_multi_gpu.py: 121]: Disc Model Parameters: 283398
2023-06-02 12:48:15,331: INFO: [train_multi_gpu.py: 122]: model train mode :True | quantizer train mode :True 
2023-06-02 12:48:15,593: INFO: [train_multi_gpu.py: 119]: {'common': {'save_interval': 5, 'test_interval': 2, 'max_epoch': 100, 'seed': 3401, 'amp': False}, 'datasets': {'train_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/libritts_train100h.csv', 'test_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/LibriTTS_dev-other.csv', 'batch_size': 6, 'tensor_cut': 100000, 'num_workers': 0, 'fixed_length': 1000, 'pin_memory': True}, 'checkpoint': {'resume': False, 'checkpoint_path': '', 'disc_checkpoint_path': '', 'save_folder': './checkpoints/', 'save_location': '${checkpoint.save_folder}batch${datasets.batch_size}_cut${datasets.tensor_cut}_length${datasets.fixed_length}_'}, 'optimization': {'lr': 1e-05, 'disc_lr': 1e-05}, 'lr_scheduler': {'warmup_epoch': 2}, 'model': {'target_bandwidths': [1.5, 3.0, 6.0, 12.0, 24.0], 'sample_rate': 24000, 'channels': 1, 'train_discriminator': True, 'audio_normalize': True, 'filters': 32}, 'distributed': {'data_parallel': True, 'world_size': 2, 'find_unused_parameters': True, 'torch_distributed_debug': False}}
2023-06-02 12:48:15,594: INFO: [train_multi_gpu.py: 120]: Encodec Model Parameters: 14855843
2023-06-02 12:48:15,595: INFO: [train_multi_gpu.py: 121]: Disc Model Parameters: 283398
2023-06-02 12:48:15,595: INFO: [train_multi_gpu.py: 122]: model train mode :True | quantizer train mode :True 
2023-06-02 12:48:15,708: INFO: [distributed_c10d.py: 432]: Added key: store_based_barrier_key:1 to store for rank: 1
2023-06-02 12:48:15,716: INFO: [distributed_c10d.py: 432]: Added key: store_based_barrier_key:1 to store for rank: 0
2023-06-02 12:48:15,717: INFO: [distributed_c10d.py: 466]: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023-06-02 12:48:15,733: INFO: [distributed_c10d.py: 466]: Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023-06-02 12:49:27,759: INFO: [train_multi_gpu.py: 65]: | epoch: 1 | loss: 33.31180191040039 | loss_g: 31.2178897857666 | loss_w: 2.093912363052368 | lr: 1.0000000000000001e-07 | disc_lr: 1.0000000000000001e-07
2023-06-02 12:50:30,795: INFO: [train_multi_gpu.py: 65]: | epoch: 2 | loss: 18.22700309753418 | loss_g: 17.53032684326172 | loss_w: 0.6966761350631714 | lr: 9.990754267376514e-06 | disc_lr: 9.990754267376514e-06
2023-06-02 12:51:12,410: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 2 | loss_g: 16.217060089111328 | loss_disc: 1.999979853630066
2023-06-02 12:52:35,755: INFO: [train_multi_gpu.py: 65]: | epoch: 3 | loss: 11.52556324005127 | loss_g: 11.375588417053223 | loss_w: 0.1499750018119812 | lr: 9.979206008271393e-06 | disc_lr: 9.979206008271393e-06
2023-06-02 12:52:35,756: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9994720220565796
2023-06-02 12:53:58,440: INFO: [train_multi_gpu.py: 65]: | epoch: 4 | loss: 12.130687713623047 | loss_g: 12.015471458435059 | loss_w: 0.11521672457456589 | lr: 9.963055062204609e-06 | disc_lr: 9.963055062204609e-06
2023-06-02 12:53:58,440: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9973113536834717
2023-06-02 12:54:26,933: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 4 | loss_g: 10.545166969299316 | loss_disc: 1.9967026710510254
2023-06-02 12:55:49,203: INFO: [train_multi_gpu.py: 65]: | epoch: 5 | loss: 10.983535766601562 | loss_g: 10.894118309020996 | loss_w: 0.08941741287708282 | lr: 9.942318025365027e-06 | disc_lr: 9.942318025365027e-06
2023-06-02 12:55:49,204: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9918556213378906
2023-06-02 12:57:13,010: INFO: [train_multi_gpu.py: 65]: | epoch: 6 | loss: 10.523397445678711 | loss_g: 10.419259071350098 | loss_w: 0.10413823276758194 | lr: 9.917016206459796e-06 | disc_lr: 9.917016206459796e-06
2023-06-02 12:57:13,011: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9791228771209717
2023-06-02 12:57:40,469: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 6 | loss_g: 9.559894561767578 | loss_disc: 1.9766931533813477
2023-06-02 12:59:04,421: INFO: [train_multi_gpu.py: 65]: | epoch: 7 | loss: 9.345637321472168 | loss_g: 9.30517292022705 | loss_w: 0.040464136749506 | lr: 9.887175604818207e-06 | disc_lr: 9.887175604818207e-06
2023-06-02 12:59:04,422: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9636516571044922
2023-06-02 13:00:27,272: INFO: [train_multi_gpu.py: 65]: | epoch: 8 | loss: 9.28451919555664 | loss_g: 9.215991973876953 | loss_w: 0.06852763891220093 | lr: 9.852826883675634e-06 | disc_lr: 9.852826883675634e-06
2023-06-02 13:00:27,273: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9527044296264648
2023-06-02 13:00:54,698: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 8 | loss_g: 8.622124671936035 | loss_disc: 1.9357608556747437
2023-06-02 13:02:17,948: INFO: [train_multi_gpu.py: 65]: | epoch: 9 | loss: 6.146459579467773 | loss_g: 6.11637544631958 | loss_w: 0.03008396551012993 | lr: 9.814005338664973e-06 | disc_lr: 9.814005338664973e-06
2023-06-02 13:02:17,949: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9932515621185303
2023-06-02 13:03:42,270: INFO: [train_multi_gpu.py: 65]: | epoch: 10 | loss: 6.795133113861084 | loss_g: 6.749742031097412 | loss_w: 0.04539122059941292 | lr: 9.77075086154801e-06 | disc_lr: 9.77075086154801e-06
2023-06-02 13:03:42,271: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9943803548812866
2023-06-02 13:04:09,624: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 10 | loss_g: 6.215439796447754 | loss_disc: 1.9836242198944092

Single GPU training
one RTX 3090, torch= 2.0.0

 distributed.data_parallel=False
 common.save_interval=5
 common.test_interval=2
 common.max_epoch=100
 datasets.fixed_length=1000
 datasets.train_csv_path=/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/libritts_train100h.csv
 datasets.tensor_cut=100000
 datasets.batch_size=6
 lr_scheduler.warmup_epoch=2
 optimization.lr=1e-5
 optimization.disc_lr=1e-5

the log:

2023-06-02 13:08:02,786: INFO: [train_multi_gpu.py: 119]: {'common': {'save_interval': 5, 'test_interval': 2, 'max_epoch': 100, 'seed': 3401, 'amp': False}, 'datasets': {'train_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/libritts_train100h.csv', 'test_csv_path': '/mnt/lustre/sjtu/home/zkn02/train_encodec/datasets/LibriTTS_dev-other.csv', 'batch_size': 6, 'tensor_cut': 100000, 'num_workers': 0, 'fixed_length': 1000, 'pin_memory': True}, 'checkpoint': {'resume': False, 'checkpoint_path': '', 'disc_checkpoint_path': '', 'save_folder': './checkpoints/', 'save_location': '${checkpoint.save_folder}batch${datasets.batch_size}_cut${datasets.tensor_cut}_length${datasets.fixed_length}_'}, 'optimization': {'lr': 1e-05, 'disc_lr': 1e-05}, 'lr_scheduler': {'warmup_epoch': 2}, 'model': {'target_bandwidths': [1.5, 3.0, 6.0, 12.0, 24.0], 'sample_rate': 24000, 'channels': 1, 'train_discriminator': True, 'audio_normalize': True, 'filters': 32}, 'distributed': {'data_parallel': False, 'world_size': 4, 'find_unused_parameters': True, 'torch_distributed_debug': False}}
2023-06-02 13:08:02,791: INFO: [train_multi_gpu.py: 120]: Encodec Model Parameters: 14855843
2023-06-02 13:08:02,792: INFO: [train_multi_gpu.py: 121]: Disc Model Parameters: 283398
2023-06-02 13:08:02,792: INFO: [train_multi_gpu.py: 122]: model train mode :True | quantizer train mode :True 
2023-06-02 13:10:11,058: INFO: [train_multi_gpu.py: 65]: | epoch: 1 | loss: 35.07368850708008 | loss_g: 33.48434066772461 | loss_w: 1.5893492698669434 | lr: 1.0000000000000001e-07 | disc_lr: 1.0000000000000001e-07
2023-06-02 13:12:14,356: INFO: [train_multi_gpu.py: 65]: | epoch: 2 | loss: 8.310249328613281 | loss_g: 8.114068984985352 | loss_w: 0.1961808204650879 | lr: 9.990754267376514e-06 | disc_lr: 9.990754267376514e-06
2023-06-02 13:13:08,171: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 2 | loss_g: 12.444842338562012 | loss_disc: 1.999990463256836
2023-06-02 13:15:45,271: INFO: [train_multi_gpu.py: 65]: | epoch: 3 | loss: 13.03136157989502 | loss_g: 12.901001930236816 | loss_w: 0.13035933673381805 | lr: 9.979206008271393e-06 | disc_lr: 9.979206008271393e-06
2023-06-02 13:15:45,272: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9959542751312256
2023-06-02 13:18:24,182: INFO: [train_multi_gpu.py: 65]: | epoch: 4 | loss: 10.319283485412598 | loss_g: 10.254348754882812 | loss_w: 0.06493447721004486 | lr: 9.963055062204609e-06 | disc_lr: 9.963055062204609e-06
2023-06-02 13:18:24,182: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9799656867980957
2023-06-02 13:19:17,341: INFO: [train_multi_gpu.py: 83]: | TEST | epoch: 4 | loss_g: 8.972946166992188 | loss_disc: 1.9816977977752686
2023-06-02 13:21:53,971: INFO: [train_multi_gpu.py: 65]: | epoch: 5 | loss: 7.428076267242432 | loss_g: 7.343043804168701 | loss_w: 0.08503223955631256 | lr: 9.942318025365027e-06 | disc_lr: 9.942318025365027e-06
2023-06-02 13:21:53,973: INFO: [train_multi_gpu.py: 67]: | loss_disc: 1.9636757373809814

My Best suggestion is make sure your torch version is samed as my torch version and use the lateset code or older code which I have not add more WarmupScheduler , Maybe there are some change in torch1.x and torch 2.x. Good luck to you. I hope this can help you @sjjbsj

ZhikangNiu · 2023-06-02T05:33:25Z

if you have another question, please contact me and I will close this issue

sjjbsj · 2023-06-02T08:03:55Z

Thank you for your reply. I tried not to use the newest WarmupScheduler optimizer, but still have this error.
Sadlly, my cuda version is 11.2 and torch2.0.0 can not be installed. Can you guide me to some directions? For example, the stft function you mentioned

ZhikangNiu · 2023-06-02T08:05:46Z

Thank you for your reply. I tried not to use the newest WarmupScheduler optimizer, but still have this error. Sadlly, my cuda version is 11.2 and torch2.0.0 can not be installed. Can you guide me to some directions? For example, the stft function you mentioned

maybe you can add more info about your environment, it can helps me to test the code.

sjjbsj · 2023-06-02T08:20:24Z

tool:
python3.7, torch1.13, two T4 GPUs
command:
CUDA_VISIBLE_DEVICES=0,1 python3 train_multi_gpu.py
distributed.torch_distributed_debug=True
distributed.find_unused_parameters=True
distributed.world_size=2
common.max_epoch=10
datasets.tensor_cut=8000
datasets.batch_size=32
datasets.train_csv_path=/home/anna.peng/PycharmProjects/encodec-pytorch-main/librispeech_train100h.csv
lr_scheduler.warmup_epoch=2
optimization.lr=1e-4
optimization.disc_lr=1e-4
some DDP info:
[I logger.cpp:213] [Rank 0]: DDP Initialized with:
broadcast_buffers: 0
bucket_cap_bytes: 26214400
find_unused_parameters: 1
gradient_as_bucket_view: 0
has_sync_bn: 0
is_multi_device_module: 0
iteration: 0
num_parameter_tensors: 160
output_device: 0
rank: 0
total_parameter_size_bytes: 59423372
world_size: 2
backend_name: nccl
bucket_sizes: 2671180, 27041280, 27836928, 1873984
cuda_visible_devices: 0,1
device_ids: 0
dtypes: float
master_addr: localhost
master_port: 12455
module_name: EncodecModel
nccl_async_error_handling: N/A
nccl_blocking_wait: N/A
nccl_debug: N/A
nccl_ib_timeout: N/A
nccl_nthreads: N/A
nccl_socket_ifname: N/A
torch_distributed_debug: DETAIL

[I logger.cpp:213] [Rank 1]: DDP Initialized with:
broadcast_buffers: 0
bucket_cap_bytes: 26214400
find_unused_parameters: 1
gradient_as_bucket_view: 0
has_sync_bn: 0
is_multi_device_module: 0
iteration: 0
num_parameter_tensors: 160
output_device: 1
rank: 1
total_parameter_size_bytes: 59423372
world_size: 2
backend_name: nccl
bucket_sizes: 2671180, 27041280, 27836928, 1873984
cuda_visible_devices: 0,1
device_ids: 1
dtypes: float
master_addr: localhost
master_port: 12455
module_name: EncodecModel
nccl_async_error_handling: N/A
nccl_blocking_wait: N/A
nccl_debug: N/A
nccl_ib_timeout: N/A
nccl_nthreads: N/A
nccl_socket_ifname: N/A
torch_distributed_debug: DETAIL

[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
[I logger.cpp:213] [Rank 1]: DDP Initialized with:
broadcast_buffers: 0
bucket_cap_bytes: 26214400
find_unused_parameters: 1
gradient_as_bucket_view: 0
has_sync_bn: 0
is_multi_device_module: 0
iteration: 0
num_parameter_tensors: 51
output_device: 1
rank: 1
total_parameter_size_bytes: 1133592
world_size: 2
backend_name: nccl
bucket_sizes: 38280, 1095312
cuda_visible_devices: 0,1
device_ids: 1
dtypes: float
master_addr: localhost
master_port: 12455
module_name: MultiScaleSTFTDiscriminator
nccl_async_error_handling: N/A
nccl_blocking_wait: N/A
nccl_debug: N/A
nccl_ib_timeout: N/A
nccl_nthreads: N/A
nccl_socket_ifname: N/A
torch_distributed_debug: DETAIL

[I logger.cpp:213] [Rank 0]: DDP Initialized with:
broadcast_buffers: 0
bucket_cap_bytes: 26214400
find_unused_parameters: 1
gradient_as_bucket_view: 0
has_sync_bn: 0
is_multi_device_module: 0
iteration: 0
num_parameter_tensors: 51
output_device: 0
rank: 0
total_parameter_size_bytes: 1133592
world_size: 2
backend_name: nccl
bucket_sizes: 38280, 1095312
cuda_visible_devices: 0,1
device_ids: 0
dtypes: float
master_addr: localhost
master_port: 12455
module_name: MultiScaleSTFTDiscriminator
nccl_async_error_handling: N/A
nccl_blocking_wait: N/A
nccl_debug: N/A
nccl_ib_timeout: N/A
nccl_nthreads: N/A
nccl_socket_ifname: N/A
torch_distributed_debug: DETAIL

When the epoch is greater than lr_scheduler.warmup_epoch, the above error will be thrown. Please let me know if I am missing any information.

ZhikangNiu · 2023-06-02T08:25:33Z

emmm, I will constract a same environment with you to test the code. Maybe you can use the commit 56de473b9e1981b36e7a9276034af70e9036b4fb to test the code? At that commit, I use two scheduler to train the model seperately.

ZhikangNiu · 2023-06-02T08:26:33Z

Besides, I think you can debug step by step to understand the specific problem. It can helps us to solve the problem @sjjbsj

ZhikangNiu · 2023-06-04T07:42:57Z

hello @sjjbsj ! I test the code in torch 1.13.0, I found there are the bugs samed as you. I guess you can change the code as follows:

    model.train()
    disc_model.train()
    for input_wav in tqdm(trainloader):
        # warmup learning rate, warmup_epoch is defined in config file,default is 5
        input_wav = input_wav.cuda() #[B, 1, T]: eg. [2, 1, 203760]
        optimizer.zero_grad()
        optimizer_disc.zero_grad()
        output, loss_w, _ = model(input_wav) #output: [B, 1, T]: eg. [2, 1, 203760] | loss_w: [1] 
        logits_real, fmap_real = disc_model(input_wav)
        logits_fake, fmap_fake = disc_model(output)
        loss_g = total_loss(fmap_real, logits_fake, fmap_fake, input_wav, output) 
        loss = loss_g + loss_w
        loss.backward()
        optimizer.step()
        scheduler.step()
        # train discriminator when epoch > warmup_epoch and train_discriminator is True
        if config.model.train_discriminator and epoch > config.lr_scheduler.warmup_epoch:
            logits_fake, _ = disc_model(output.detach()) # detach to avoid backpropagation to model
            loss_disc = disc_loss([logit_real.detach() for logit_real in logits_real], logits_fake) # compute discriminator loss
            loss_disc.backward() 
            optimizer_disc.step()
            disc_scheduler.step()

ZhikangNiu · 2023-06-04T07:44:47Z

also you can find the problems in NVlabs/FUNIT#23

ZhikangNiu · 2023-06-04T08:59:43Z

I will test the code's results, if perform well, I will merge the issue6 branch to main

sjjbsj · 2023-06-05T00:56:14Z

Thanks , the problem has been successfully resolved！

[bugs]:fix #6

ZhikangNiu added the bug Something isn't working label Jun 2, 2023

ZhikangNiu closed this as completed Jun 2, 2023

ZhikangNiu reopened this Jun 2, 2023

ZhikangNiu pinned this issue Jun 4, 2023

ZhikangNiu added the solved label Jun 4, 2023

sjjbsj closed this as completed Jun 5, 2023

ZhikangNiu added a commit that referenced this issue Jun 15, 2023

Merge pull request #7 from NoFish-528/issue6

ef99d33

[bugs]:fix #6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: one of the variables needed for gradient computation has been modified #6

RuntimeError: one of the variables needed for gradient computation has been modified #6

sjjbsj commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

sjjbsj commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

sjjbsj commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 4, 2023 •

edited

Loading

ZhikangNiu commented Jun 4, 2023

ZhikangNiu commented Jun 4, 2023

sjjbsj commented Jun 5, 2023

RuntimeError: one of the variables needed for gradient computation has been modified #6

RuntimeError: one of the variables needed for gradient computation has been modified #6

Comments

sjjbsj commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 1, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

sjjbsj commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

sjjbsj commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 2, 2023

ZhikangNiu commented Jun 4, 2023 • edited Loading

ZhikangNiu commented Jun 4, 2023

ZhikangNiu commented Jun 4, 2023

sjjbsj commented Jun 5, 2023

ZhikangNiu commented Jun 4, 2023 •

edited

Loading