Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training grayscale images #635

Open
rememberBr opened this issue Feb 2, 2024 · 1 comment
Open

About training grayscale images #635

rememberBr opened this issue Feb 2, 2024 · 1 comment

Comments

@rememberBr
Copy link

Describe the bug
RuntimeError: output with shape [64, 1, 1, 1] doesn't match the broadcast shape [64, 3, 1, 1]

To Reproduce
Steps to reproduce the behavior:

  1. In 'stylegan3' directory, run command 'python train.py --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-256x256.pkl ....'
  2. See error
    Training options:
    {
    "G_kwargs": {
    "class_name": "training.networks_stylegan3.Generator",
    "z_dim": 512,
    "w_dim": 512,
    "mapping_kwargs": {
    "num_layers": 2
    },
    "channel_base": 32768,
    "channel_max": 1024,
    "magnitude_ema_beta": 0.9977843871238888,
    "conv_kernel": 1,
    "use_radial_filters": true
    },
    "D_kwargs": {
    "class_name": "training.networks_stylegan2.Discriminator",
    "block_kwargs": {
    "freeze_layers": 0
    },
    "mapping_kwargs": {},
    "epilogue_kwargs": {
    "mbstd_group_size": 4
    },
    "channel_base": 16384,
    "channel_max": 512
    },
    "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
    0,
    0.99
    ],
    "eps": 1e-08,
    "lr": 0.0025
    },
    "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
    0,
    0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
    },
    "loss_kwargs": {
    "class_name": "training.loss.StyleGAN2Loss",
    "r1_gamma": 2.0,
    "blur_init_sigma": 0,
    "blur_fade_kimg": 400.0
    },
    "data_loader_kwargs": {
    "pin_memory": true,
    "prefetch_factor": 2,
    "num_workers": 3
    },
    "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "../Gan/data/data/256L",
    "use_labels": false,
    "max_size": 4813,
    "xflip": true,
    "resolution": 256,
    "random_seed": 0
    },
    "num_gpus": 1,
    "batch_size": 64,
    "batch_gpu": 64,
    "metrics": [],
    "total_kimg": 25000,
    "kimg_per_tick": 4,
    "image_snapshot_ticks": 5,
    "network_snapshot_ticks": 5,
    "random_seed": 0,
    "ema_kimg": 20.0,
    "augment_kwargs": {
    "class_name": "training.augment.AugmentPipe",
    "xflip": 1,
    "rotate90": 1,
    "xint": 1,
    "scale": 1,
    "rotate": 1,
    "aniso": 1,
    "xfrac": 1,
    "brightness": 1,
    "contrast": 1,
    "lumaflip": 1,
    "hue": 1,
    "saturation": 1
    },
    "ada_target": 0.6,
    "resume_pkl": "https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-256x256.pkl",
    "ada_kimg": 100,
    "ema_rampup": null,
    "run_dir": "~/training-runs-rr/00000-stylegan3-r-256L-gpus1-batch64-gamma2"
    }

Output directory: ~/training-runs-rr/00000-stylegan3-r-256L-gpus1-batch64-gamma2
Number of GPUs: 1
Batch size: 64 images
Training duration: 25000 kimg
Dataset path: ../Gan/data/data/256L
Dataset size: 4813 images
Dataset resolution: 256
Dataset labels: False
Dataset x-flips: True

Creating output directory...
Launching processes...
Loading training set...

Num images: 9626
Image shape: [1, 256, 256]
Label shape: [0]

Constructing networks...
Resuming from "https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-256x256.pkl"
Traceback (most recent call last):
File "/home/bairu/workspace/DatasetGEN/styleGan3/train.py", line 288, in
main() # pylint: disable=no-value-for-parameter
File "/home/bairu/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/bairu/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/bairu/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bairu/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return callback(*args, **kwargs)
File "/home/bairu/workspace/DatasetGEN/styleGan3/train.py", line 283, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "/home/bairu/workspace/DatasetGEN/styleGan3/train.py", line 98, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "/home/bairu/workspace/DatasetGEN/styleGan3/train.py", line 49, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/home/bairu/workspace/DatasetGEN/styleGan3/training/training_loop.py", line 164, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "/home/bairu/workspace/DatasetGEN/styleGan3/torch_utils/misc.py", line 162, in copy_params_and_buffers
tensor.copy
(src_tensors[name].detach()).requires_grad
(tensor.requires_grad)
RuntimeError: output with shape [64, 1, 1, 1] doesn't match the broadcast shape [64, 3, 1, 1]

Expected behavior
The target data I want to generate is a single channel grayscale image. When I use grayscale images for training, it will improve this error.

Desktop (please complete the following information):

  • OS: Linux Ubuntu 20.04
  • pytorch 1.9.0
  • CUDA toolkit version CUDA 11.1

Additional context
If pre-trained models are not used, it is feasible. This seems to be because the input of the pre trained model is three channels? What should I do if I want to use a pre-trained model for training single channel images?

@Neilstid
Copy link

Neilstid commented Apr 2, 2024

This cannot work since you are loading a trained model that have been trained to generate RGB images (3 channels).
In my opinion there is three solution:
-You modify the training_loop.py so that after loading the Generator only outputs one channel (either R, G or B) -> Not the easiest but I think it may work

  • You train from scratch your network (if your data has 1 channel, the generated images will be 1 channel too)
  • You modify the input images to be 3 channels (same channel repeated 3 times after loading the image in dataset.py). You can then select one of the 3 channel as your grayscale image

I hope it will help you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants