TypeError: loss = recons_loss + kld_weight * kld_loss #95

folasefo · 2024-05-20T03:32:00Z

Hi
When I use my dataset (3,192,192), and I change some parameters,

According to debug, these value before loss = recons_loss + kld_weight * kld_loss
kld_loss： tensor(0.4086, device='cuda:1', grad_fn=)
kld_weight: 0.00025d
recons_loss: tensor(0.1564, device='cuda:1', grad_fn=)

/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/plugins/training_type/ddp.py:20: LightningDeprecationWarning: The `pl.plugins.training_type.ddp.DDPPlugin` is deprecated in v1.6 and will be removed in v1.8. Use `pl.strategies.ddp.DDPStrategy` instead.
  rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
======= Training VanillaVAE =======
Global seed set to 1265
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]

  | Name  | Type       | Params
-------------------------------------
0 | model | VanillaVAE | 3.2 M 
-------------------------------------
3.2 M     Trainable params
0         Non-trainable params
3.2 M     Total params
12.678    Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                    | 0/4 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]:   File "/home/lulu/PyTorch-VAE/run.py", line 64, in <module>
[rank0]:     runner.fit(experiment, datamodule=data)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
[rank0]:     self._call_and_handle_interrupt(
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
[rank0]:     return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
[rank0]:     return function(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
[rank0]:     results = self._run(model, ckpt_path=self.ckpt_path)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
[rank0]:     results = self._run_stage()
[rank0]:               ^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
[rank0]:     return self._run_train()
[rank0]:            ^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
[rank0]:     self.fit_loop.run()
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/base.py", line 204, in run
[rank0]:     self.advance(*args, **kwargs)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
[rank0]:     self._outputs = self.epoch_loop.run(self._data_fetcher)
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/base.py", line 204, in run
[rank0]:     self.advance(*args, **kwargs)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
[rank0]:     batch_output = self.batch_loop.run(batch, batch_idx)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/base.py", line 204, in run
[rank0]:     self.advance(*args, **kwargs)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
[rank0]:     outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/base.py", line 204, in run
[rank0]:     self.advance(*args, **kwargs)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
[rank0]:     result = self._run_optimization(
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
[rank0]:     self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
[rank0]:     self.trainer._call_lightning_module_hook(
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
[rank0]:     output = fn(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
[rank0]:     optimizer.step(closure=optimizer_closure)
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
[rank0]:     step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
[rank0]:     optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
[rank0]:     return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
[rank0]:     return optimizer.step(closure=closure, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
[rank0]:     return wrapped(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
[rank0]:     out = func(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
[rank0]:     ret = func(self, *args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/optim/adam.py", line 148, in step
[rank0]:     loss = closure()
[rank0]:            ^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
[rank0]:     closure_result = closure()
[rank0]:                      ^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
[rank0]:     self._result = self.closure(*args, **kwargs)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
[rank0]:     step_output = self._step_fn()
[rank0]:                   ^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
[rank0]:     training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
[rank0]:     output = fn(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
[rank0]:     return self.model(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward
[rank0]:     else self._run_ddp_forward(*inputs, **kwargs)
[rank0]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward
[rank0]:     return self.module(*inputs, **kwargs)  # type: ignore[index]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/anaconda3/envs/pytorch/lib/python3.11/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
[rank0]:     output = self.module.training_step(*inputs, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/PyTorch-VAE/experiment.py", line 39, in training_step
[rank0]:     train_loss = self.model.loss_function(*results,
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/lulu/PyTorch-VAE/models/vanilla_vae.py", line 146, in loss_function
[rank0]:     loss = recons_loss + kld_weight * kld_loss
[rank0]:                          ~~~~~~~~~~~^~~~~~~~~~
[rank0]: TypeError: only integer tensors of a single element can be converted to an index

If I set the kld_weight to 1, it worked, but it didn't work out so well:
Sample:

Reconstruction:

One of my dataset:

The kld_weight=1 is okay? How to make the reconstruction and sample better?
Thanks!

The text was updated successfully, but these errors were encountered:

MisterBourbaki · 2024-05-20T09:58:02Z

Hi there,
I have a couple of questions for you, in order to better understand the issue: which version of python do you use? And which parameters did you change?

folasefo · 2024-05-20T10:29:10Z

Hi there, I have a couple of questions for you, in order to better understand the issue: which version of python do you use? And which parameters did you change?
First question
Python 3.11.8

Second question
In dataset.py:

change the batch_size from 144 to 64 in dataset.py
comment transforms.CenterCrop(148),
according to class OxfordPets(Dataset), I finish my dataset.

`class MyDataset(Dataset):
def init(self,
data_path: str,
split: str,
transform: Callable,
**kwargs):
self.data_dir = Path(data_path) / "/home/lulu/PyTorch-VAE/Data/bgs_image"
self.transforms = transform
imgs = sorted([f for f in self.data_dir.iterdir() if f.suffix == '.jpg'])

    self.imgs = imgs[:int(len(imgs) * 0.75)] if split == "train" else imgs[int(len(imgs) * 0.75):]

def __len__(self):
    return len(self.imgs)

def __getitem__(self, idx):
    img = default_loader(self.imgs[idx])
    
    if self.transforms is not None:
        img = self.transforms(img)
    
    return img, 0.0 # dummy datat to prevent breaking `

change patch_size: Union[int, Sequence[int]] = (64, 64), from 256 to 64

In vae.yaml

`model_params:
name: 'VanillaVAE'
in_channels: 3
latent_dim: 3

data_params:
data_path: "/home/lulu/PyTorch-VAE/Data/"
#data_path: "/home/lulu/PyTorch-VAE/Data/"
train_batch_size: 64
val_batch_size: 64
patch_size: 64
num_workers: 4

exp_params:
LR: 0.005
weight_decay: 0.0
scheduler_gamma: 0.95
kld_weight: 0.00025d
manual_seed: 1265

trainer_params:
gpus: [1]
max_epochs: 20

logging_params:
save_dir: "logs/"
name: "VanillaVAE"

`
Thank you!

MisterBourbaki · 2024-05-20T11:31:44Z

I think the issue is simply that there is an additionnal 'd' at the line "kld_weight" in the YAML file? Making it a string, hence difficult ot handle :)

folasefo · 2024-05-20T11:46:38Z

I think the issue is simply that there is an additionnal 'd' at the line "kld_weight" in the YAML file? Making it a string, hence difficult ot handle :)

You Are Wonderful! This problem is solved.
Thank you!

folasefo · 2024-05-20T11:51:47Z

I think the issue is simply that there is an additionnal 'd' at the line "kld_weight" in the YAML file? Making it a string, hence difficult ot handle :)

And if I want to make the reconstruction sample better, can you give me some advice?
My dataset is pictures of galaxies, I want to extract the geometry of these.
Thanks again!

MisterBourbaki · 2024-05-20T16:11:01Z

This is the question of all Machine Learning :D Be sure to test a few different hyperparameters, check that the validation loss decrease, and so on. As long as both train and val losses do not seem to be able to decrease anymore, it means that the model as achieve its best (with the chosen hyperparam). Otherwise, give the training a few more rounds :)

folasefo · 2024-05-20T16:23:04Z

This is the question of all Machine Learning :D Be sure to test a few different hyperparameters, check that the validation loss decrease, and so on. As long as both train and val losses do not seem to be able to decrease anymore, it means that the model as achieve its best (with the chosen hyperparam). Otherwise, give the training a few more rounds :)

Okay, I will try it :)
Thanks for your patience and help, have a nice day ;)

sunny12345-bit · 2024-05-23T14:39:24Z

Can you train to achieve good results? I can't get good results with my own dataset

sunny12345-bit · 2024-05-24T01:18:19Z

I think the issue is simply that there is an additionnal 'd' at the line "kld_weight" in the YAML file? Making it a string, hence difficult ot handle :)

And if I want to make the reconstruction sample better, can you give me some advice? My dataset is pictures of galaxies, I want to extract the geometry of these. Thanks again!
Can you train to achieve good results? I can't get good results with my own dataset

MisterBourbaki · 2024-05-24T06:34:07Z

Hi @sunny12345-bit , if you have some particular issue with the code, it would be best to open a dedicated issue :) It will help to help you!

folasefo · 2024-05-24T06:45:13Z

Can you train to achieve good results? I can't get good results with my own dataset

Hmm, I have been busy with other assignments and haven't started to revise them :-(
And I think my parameters are not suitable, the next step will look at the loss function and adjust parameters.
Does the loss function decrease with each epoch of your data?

folasefo · 2024-06-05T12:41:42Z

I think the issue is simply that there is an additionnal 'd' at the line "kld_weight" in the YAML file? Making it a string, hence difficult ot handle :)

And if I want to make the reconstruction sample better, can you give me some advice? My dataset is pictures of galaxies, I want to extract the geometry of these. Thanks again!
Can you train to achieve good results? I can't get good results with my own dataset

Hi, my reconstruction results become better, do you get the good results?

sunny12345-bit · 2024-06-09T13:06:39Z

嗨，我的重建效果变好了，你得到好的结果了吗？

May I ask which parameters you changed to achieve better results

folasefo · 2024-06-09T15:39:27Z

嗨，我的重建效果变好了，你得到好的结果了吗？

May I ask which parameters you changed to achieve better results

I just have changed all parameters to the origin, and then changed the parameters of the image size to my data image size.
And make sure your dataset and latent dim are big enough, or the model can't seize features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: loss = recons_loss + kld_weight * kld_loss #95

TypeError: loss = recons_loss + kld_weight * kld_loss #95

folasefo commented May 20, 2024 •

edited

Loading

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

folasefo commented May 20, 2024

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

sunny12345-bit commented May 23, 2024

sunny12345-bit commented May 24, 2024

MisterBourbaki commented May 24, 2024

folasefo commented May 24, 2024

folasefo commented Jun 5, 2024

sunny12345-bit commented Jun 9, 2024

folasefo commented Jun 9, 2024

TypeError: loss = recons_loss + kld_weight * kld_loss #95

TypeError: loss = recons_loss + kld_weight * kld_loss #95

Comments

folasefo commented May 20, 2024 • edited Loading

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

folasefo commented May 20, 2024

MisterBourbaki commented May 20, 2024

folasefo commented May 20, 2024

sunny12345-bit commented May 23, 2024

sunny12345-bit commented May 24, 2024

MisterBourbaki commented May 24, 2024

folasefo commented May 24, 2024

folasefo commented Jun 5, 2024

sunny12345-bit commented Jun 9, 2024

folasefo commented Jun 9, 2024

folasefo commented May 20, 2024 •

edited

Loading