A confusing question in transformer training #15

forever-rz · 2022-11-09T04:12:18Z

Thanks for your contribution,but there is a problem when i train it on FFHQ.Once the ratio of mask is larger, it seems that only part of the completed result is repaired, the part that is not repaired stays black and no new content seems to be generated. Is this normal?
eg1: First on the left, second on the left remain some strange black regions .(epoch:12)
mask_input

completed

reconstruction

forever-rz · 2022-11-09T04:32:50Z

This bad visual effect became increasingly apparent later in the training.
eg2: epoch:19
mask_input

completed

reconstruction

input

liuqk3 · 2022-11-09T12:40:52Z

Hi @forever-rz , Thanks for your interests.
It seems that the second codebook is used for quantization while training transformer. For FFHQ, it will not take so long to get a reasonable inpainting results.

forever-rz · 2022-11-09T23:32:57Z

@liuqk3 Thanks for your help! Can you tell me how should I use just one codebook ?
I only modified three parts of the transformer training configuration( batch_size,sample_iterations,save_epochs), the rest was followed by the pre-trained model configuration.This is really confusing to me.

forever-rz · 2022-11-09T23:42:34Z

My configuration is shown below：
dataloader:
batch_size: 16
data_root: data
num_workers: 1
train_datasets:

params:
im_preprocessor_config:
params:
horizon_flip: true
random_crop: true
size:
- 256
- 256
smallest_max_size: 256
target: image_synthesis.data.utils.image_preprocessor.SimplePreprocessor
image_list_file: data/ffhqtrain_69k.txt
mask: 1.0
mask_low_size:
- 32
- 32
  mask_low_to_high: 0.1
  multi_image_mask: false
  name: ffhq
  provided_mask_name: irregular-mask/testing_mask_dataset
  return_data_keys:
- image
- mask
  stroken_mask_params:
  keep_ratio:
  - 0.3
  - 0.6
    maxBrushWidth: 30
    maxLength: 100
    maxVertex: 10
    minBrushWidth: 10
    minVertex: 5
    min_area: 64
    use_provided_mask: 0.8
    use_provided_mask_ratio:
- 0.3333333
- 1.0
  target: image_synthesis.data.image_list_dataset.ImageListDataset
  validation_datasets:
params:
im_preprocessor_config:
params:
size:
- 256
- 256
smallest_max_size: 256
target: image_synthesis.data.utils.image_preprocessor.SimplePreprocessor
image_list_file: data/ffhqvalidation_1k.txt
mask: 1.0
mask_low_size:
- 32
- 32
  mask_low_to_high: 0.0
  multi_image_mask: false
  name: ffhq
  provided_mask_name: irregular-mask/testing_mask_dataset
  return_data_keys:
- image
- mask
- relative_path
  stroken_mask_params:
  keep_ratio:
  - 0.3
  - 0.6
    maxBrushWidth: 30
    maxLength: 100
    maxVertex: 10
    minBrushWidth: 10
    minVertex: 5
    min_area: 64
    use_provided_mask: 1.0 #TODO0.8
    use_provided_mask_ratio:
- 0.4
- 1.0
  target: image_synthesis.data.image_list_dataset.ImageListDataset

model:
target: image_synthesis.modeling.models.masked_image_inpainting_transformer_in_feature.MaskedImageInpaintingTransformer
params:
n_layer: 30
content_seq_len: 1024
n_embd: 512
n_head: 8
num_token: 512
embd_pdrop: 0.0
attn_pdrop: 0.0
resid_pdrop: 0.0
attn_content_with_mask: False
mlp_hidden_times: 4
block_activate: GELU2
random_quantize: 0.3
weight_decay: 0.01
content_codec_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchVQGAN
params:
ckpt_path: OUTPUT/pvqvae_ffhq/checkpoint/last.pth
trainable: False
token_shape: [32, 32]
combine_rec_and_gt: True
quantizer_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.VectorQuantizer
params:
n_e: 1024
e_dim: 256
masked_embed_start: 512
embed_ema: True
get_embed_type: retrive
distance_type: euclidean
encoder_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchEncoder2
params:
in_ch: 3
res_ch: 256
out_ch: 256
num_res_block: 8
res_block_bottleneck: 2
stride: 8
decoder_config:
target: image_synthesis.modeling.codecs.image_codec.patch_vqgan.PatchConvDecoder2
params:
in_ch: 256
out_ch: 3
res_ch: 256
num_res_block: 8
res_block_bottleneck: 2
stride: 8
up_layer_with_image: true
encoder_downsample_layer: conv
solver:
adjust_lr: none
base_lr: 0.0
find_unused_parameters: false
max_epochs: 250
optimizers_and_schedulers:

name: transformer
optimizer:
params:
betas: !!python/tuple
- 0.9
- 0.95
target: torch.optim.AdamW
scheduler:
params:
min_lr: 1.0e-05
warmup: 2000
warmup_lr: 0.0003
step_iteration: 1
target: image_synthesis.engine.lr_scheduler.CosineAnnealingLRWithWarmup
sample_iterations: 400
save_epochs: 1
validation_epochs: 1

forever-rz · 2022-11-10T00:00:06Z

The strange thing is that when the mask ratio is small, like in the green circle, there are no such problems and only one codebook seems to be used, so why is it possible that two codebooks are used when the mask ratio is large (in the red circle)? What's wrong with my settings?
eg epoch:8
mask_input

completed

forever-rz · 2022-11-11T09:49:52Z

@liuqk3 Thanks for the reply, but I've carefully compared the posted model to my model training process and really don't notice any difference. So was wondering if it could be the parameters? In the training phase of pvqvae keep_ratio:[0.0,0.5] while the transformer training phase keep_ratio:[0.3,0.6] causes this?

liuqk3 · 2022-11-15T12:29:28Z

Hi, @forever-rz . Sorry for the delayed reply.

keep_ratio just affects the number of remained pixels in an image, it should not cause such as artifacts. After have a loot at your configs, I do not find something wrong. Here is my questions or suggestions:

Did you use our provided P-VQVAE or train it by yourself?
Have you checked the reconstruction capability of the used P-VQVAE?
Can you provide the cross-entropy loss curves of transformer?

forever-rz · 2022-12-13T08:43:31Z

Hi, @liuqk3 .I'm sorry that I temporarily put aside the experiment because I couldn't figure out the cause of this problem. Today, I carefully checked the previous experiment and found the relevant data of the three questions you raised as follows.
1 Instead of using the provided P-VQVAE, I trained a new P-VQVAE model companion and added some attention blocks.
2 The reconstruction results of my PVQVAE model are as follows
FFHQ
(a)input

(b) mask

(c) reference_input

(d)reconstruction

PLACES2
(a)input

(b) mask

(c) reference_input

(d)reconstruction

IMAGENT
(a)input

(b) mask

(c) reference_input

(d)reconstruction

3 The loss of training transformer based on my P-VQVAE is as follows (ImageNet is too big, so it is given up)
FFHQ:

Places2

forever-rz · 2022-12-13T08:54:29Z

It seems to me that the reconstruction results are not bad, so I don't understand why transformer's training results are so wrong. Although I made some changes to P-VQVAE, transformer has not changed at all.

liuqk3 · 2022-12-14T07:54:18Z

@forever-rz ，
I do not know the number of epochs you have trained on FFHQ and Places2. You can try to visualize the inpainting results of the trained model.

myhansu · 2023-08-25T13:01:03Z

Hi, do you have any updates on this issue? I have also encountered same problem with a custom dataset, reconstruction results are much better than completed results.

UESTC-Med424-JYX · 2024-07-25T10:49:58Z

I also encountered the problem of good reconstruction effect but poor generation effect in a similar task. I saw that our loss curves are basically the same. Did you solve this problem in the end?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A confusing question in transformer training #15

A confusing question in transformer training #15

forever-rz commented Nov 9, 2022

forever-rz commented Nov 9, 2022

liuqk3 commented Nov 9, 2022

forever-rz commented Nov 9, 2022 •

edited

Loading

forever-rz commented Nov 9, 2022 •

edited

Loading

forever-rz commented Nov 10, 2022

forever-rz commented Nov 11, 2022 •

edited

Loading

liuqk3 commented Nov 15, 2022

forever-rz commented Dec 13, 2022 •

edited

Loading

forever-rz commented Dec 13, 2022

liuqk3 commented Dec 14, 2022

myhansu commented Aug 25, 2023

UESTC-Med424-JYX commented Jul 25, 2024

A confusing question in transformer training #15

A confusing question in transformer training #15

Comments

forever-rz commented Nov 9, 2022

forever-rz commented Nov 9, 2022

liuqk3 commented Nov 9, 2022

forever-rz commented Nov 9, 2022 • edited Loading

forever-rz commented Nov 9, 2022 • edited Loading

forever-rz commented Nov 10, 2022

forever-rz commented Nov 11, 2022 • edited Loading

liuqk3 commented Nov 15, 2022

forever-rz commented Dec 13, 2022 • edited Loading

forever-rz commented Dec 13, 2022

liuqk3 commented Dec 14, 2022

myhansu commented Aug 25, 2023

UESTC-Med424-JYX commented Jul 25, 2024

forever-rz commented Nov 9, 2022 •

edited

Loading

forever-rz commented Nov 9, 2022 •

edited

Loading

forever-rz commented Nov 11, 2022 •

edited

Loading

forever-rz commented Dec 13, 2022 •

edited

Loading