Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A confusing question in transformer training #15

Open
forever-rz opened this issue Nov 9, 2022 · 12 comments
Open

A confusing question in transformer training #15

forever-rz opened this issue Nov 9, 2022 · 12 comments

Comments

@forever-rz
Copy link

Thanks for your contribution,but there is a problem when i train it on FFHQ.Once the ratio of mask is larger, it seems that only part of the completed result is repaired, the part that is not repaired stays black and no new content seems to be generated. Is this normal?
eg1: First on the left, second on the left remain some strange black regions .(epoch:12)
mask_input
image
completed
image
reconstruction
image

@forever-rz
Copy link
Author

This bad visual effect became increasingly apparent later in the training.
eg2: epoch:19
mask_input
image
completed
image
reconstruction
image
input
image

@liuqk3
Copy link
Owner

liuqk3 commented Nov 9, 2022

Hi @forever-rz , Thanks for your interests.
It seems that the second codebook is used for quantization while training transformer. For FFHQ, it will not take so long to get a reasonable inpainting results.

@forever-rz
Copy link
Author

forever-rz commented Nov 9, 2022

@liuqk3 Thanks for your help! Can you tell me how should I use just one codebook ?
I only modified three parts of the transformer training configuration( batch_size,sample_iterations,save_epochs), the rest was followed by the pre-trained model configuration.This is really confusing to me.

@forever-rz
Copy link
Author

forever-rz commented Nov 9, 2022

My configuration is shown below:
dataloader:
batch_size: 16
data_root: data
num_workers: 1
train_datasets:

  • params:
    im_preprocessor_config:
    params:
    horizon_flip: true
    random_crop: true
    size:
    - 256
    - 256
    smallest_max_size: 256
    target: image_synthesis.data.utils.image_preprocessor.SimplePreprocessor
    image_list_file: data/ffhqtrain_69k.txt
    mask: 1.0
    mask_low_size:
    • 32
    • 32
      mask_low_to_high: 0.1
      multi_image_mask: false
      name: ffhq
      provided_mask_name: irregular-mask/testing_mask_dataset
      return_data_keys:
    • image
    • mask
      stroken_mask_params:
      keep_ratio:
      • 0.3
      • 0.6
        maxBrushWidth: 30
        maxLength: 100
        maxVertex: 10
        minBrushWidth: 10
        minVertex: 5
        min_area: 64
        use_provided_mask: 0.8
        use_provided_mask_ratio:
    • 0.3333333
    • 1.0
      target: image_synthesis.data.image_list_dataset.ImageListDataset
      validation_datasets:
  • params:
    im_preprocessor_config:
    params:
    size:
    - 256
    - 256
    smallest_max_size: 256
    target: image_synthesis.data.utils.image_preprocessor.SimplePreprocessor
    image_list_file: data/ffhqvalidation_1k.txt
    mask: 1.0
    mask_low_size:
    • 32
    • 32
      mask_low_to_high: 0.0
      multi_image_mask: false
      name: ffhq
      provided_mask_name: irregular-mask/testing_mask_dataset
      return_data_keys:
    • image
    • mask
    • relative_path
      stroken_mask_params:
      keep_ratio:
      • 0.3
      • 0.6
        maxBrushWidth: 30
        maxLength: 100
        maxVertex: 10
        minBrushWidth: 10
        minVertex: 5
        min_area: 64
        use_provided_mask: 1.0 #TODO0.8
        use_provided_mask_ratio:
    • 0.4
    • 1.0
      target: image_synthesis.data.image_list_dataset.ImageListDataset

model:
target: image_synthesis.modeling.models.masked_image_inpainting_transformer_in_feature.MaskedImageInpaintingTransformer
params:
n_layer: 30
content_seq_len: 1024
n_embd: 512
n_head: 8
num_token: 512
embd_pdrop: 0.0
attn_pdrop: 0.0
resid_pdrop: 0.0
attn_content_with_mask: False
mlp_hidden_times: 4
block_activate: GELU2
random_quantize: 0.3
weight_decay: 0.01
content_codec_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchVQGAN
params:
ckpt_path: OUTPUT/pvqvae_ffhq/checkpoint/last.pth
trainable: False
token_shape: [32, 32]
combine_rec_and_gt: True
quantizer_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.VectorQuantizer
params:
n_e: 1024
e_dim: 256
masked_embed_start: 512
embed_ema: True
get_embed_type: retrive
distance_type: euclidean
encoder_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchEncoder2
params:
in_ch: 3
res_ch: 256
out_ch: 256
num_res_block: 8
res_block_bottleneck: 2
stride: 8
decoder_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchConvDecoder2
params:
in_ch: 256
out_ch: 3
res_ch: 256
num_res_block: 8
res_block_bottleneck: 2
stride: 8
up_layer_with_image: true
encoder_downsample_layer: conv
solver:
adjust_lr: none
base_lr: 0.0
find_unused_parameters: false
max_epochs: 250
optimizers_and_schedulers:

  • name: transformer
    optimizer:
    params:
    betas: !!python/tuple
    - 0.9
    - 0.95
    target: torch.optim.AdamW
    scheduler:
    params:
    min_lr: 1.0e-05
    warmup: 2000
    warmup_lr: 0.0003
    step_iteration: 1
    target: image_synthesis.engine.lr_scheduler.CosineAnnealingLRWithWarmup
    sample_iterations: 400
    save_epochs: 1
    validation_epochs: 1

@forever-rz
Copy link
Author

The strange thing is that when the mask ratio is small, like in the green circle, there are no such problems and only one codebook seems to be used, so why is it possible that two codebooks are used when the mask ratio is large (in the red circle)? What's wrong with my settings?
eg epoch:8
mask_input
image
completed
image

@forever-rz
Copy link
Author

forever-rz commented Nov 11, 2022

@liuqk3 Thanks for the reply, but I've carefully compared the posted model to my model training process and really don't notice any difference. So was wondering if it could be the parameters? In the training phase of pvqvae keep_ratio:[0.0,0.5] while the transformer training phase keep_ratio:[0.3,0.6] causes this?

@liuqk3
Copy link
Owner

liuqk3 commented Nov 15, 2022

Hi, @forever-rz . Sorry for the delayed reply.

keep_ratio just affects the number of remained pixels in an image, it should not cause such as artifacts. After have a loot at your configs, I do not find something wrong. Here is my questions or suggestions:

  1. Did you use our provided P-VQVAE or train it by yourself?
  2. Have you checked the reconstruction capability of the used P-VQVAE?
  3. Can you provide the cross-entropy loss curves of transformer?

@forever-rz
Copy link
Author

forever-rz commented Dec 13, 2022

Hi, @liuqk3 .I'm sorry that I temporarily put aside the experiment because I couldn't figure out the cause of this problem. Today, I carefully checked the previous experiment and found the relevant data of the three questions you raised as follows.
1 Instead of using the provided P-VQVAE, I trained a new P-VQVAE model companion and added some attention blocks.
2 The reconstruction results of my PVQVAE model are as follows
FFHQ
(a)input
27_input
(b) mask
27_mask
(c) reference_input
27_reference_input
(d)reconstruction
27_reconstruction

PLACES2
(a)input
input
(b) mask
mask
(c) reference_input
64_reference_input
(d)reconstruction
64_reconstruction
IMAGENT
(a)input
29_input
(b) mask
29_mask
(c) reference_input
29_reference_input
(d)reconstruction
29_reconstruction

3 The loss of training transformer based on my P-VQVAE is as follows (ImageNet is too big, so it is given up)
FFHQ:
image
Places2
image

@forever-rz
Copy link
Author

It seems to me that the reconstruction results are not bad, so I don't understand why transformer's training results are so wrong. Although I made some changes to P-VQVAE, transformer has not changed at all.

@liuqk3
Copy link
Owner

liuqk3 commented Dec 14, 2022

@forever-rz
I do not know the number of epochs you have trained on FFHQ and Places2. You can try to visualize the inpainting results of the trained model.

@myhansu
Copy link

myhansu commented Aug 25, 2023

Hi, do you have any updates on this issue? I have also encountered same problem with a custom dataset, reconstruction results are much better than completed results.

@UESTC-Med424-JYX
Copy link

I also encountered the problem of good reconstruction effect but poor generation effect in a similar task. I saw that our loss curves are basically the same. Did you solve this problem in the end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants