RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

lihao-doc · 2024-06-27T00:32:11Z

[2024-06-27 08:23:37,448] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible
I0627 08:23:38.489170 136925989832512 eval_ldm_discrete.py:140] Process 0 using device: cuda
Counting ImageNet files from assets/datasets/ImageNet
Finish counting ImageNet files
Missing train samples: 1280444 < 1281167
1000 classes
cnt[:10]: tensor([1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300.])
frac[:10]: [tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010)]
prepare the dataset for classifier free guidance with p_uncond=0.1
2024-06-27 08:23:41,511 - _cpp_lib.py - WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.1.1)
Python 3.9.19 (you have 3.9.19)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
2024-06-27 08:23:56,201 - eval_ldm_discrete.py - load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet_ema.pth
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:341 in │
│ │
│ 338 │
│ 339 │
│ 340 if name == "main": │
│ ❱ 341 │ app.run(main) │
│ 342 │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:308 in run │
│ │
│ 305 │ callback = _init_callbacks.popleft() │
│ 306 │ callback() │
│ 307 │ try: │
│ ❱ 308 │ _run_main(main, args) │
│ 309 │ except UsageError as error: │
│ 310 │ usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode) │
│ 311 │ except: │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:254 in _run_main │
│ │
│ 251 │ atexit.register(profiler.print_stats) │
│ 252 │ sys.exit(profiler.runcall(main, argv)) │
│ 253 else: │
│ ❱ 254 │ sys.exit(main(argv)) │
│ 255 │
│ 256 │
│ 257 def call_exception_handlers(exception): │
│ │
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:337 in main │
│ │
│ 334 │ config = FLAGS.config │
│ 335 │ config.nnet_path = FLAGS.nnet_path │
│ 336 │ config.output_path = FLAGS.output_path │
│ ❱ 337 │ evaluate(config) │
│ 338 │
│ 339 │
│ 340 if name == "main": │
│ │
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:156 in evaluate │
│ │
│ 153 │ nnet = accelerator.prepare(nnet) │
│ 154 │ logging.info(f'load nnet from {config.nnet_path}') │
│ 155 │ if (config.nnet_path is not None) and (config.sample.algorithm != 'dpm_solver_upsamp │
│ ❱ 156 │ │ accelerator.unwrap_model(nnet).load_state_dict(torch.load(config.nnet_path, map │
│ 157 │ else: │
│ 158 │ │ accelerator.unwrap_model(nnet) │
│ 159 │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/torch/nn/modules/module.py:215 │
│ 2 in load_state_dict │
│ │
│ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │
│ 2150 │ │ │
│ 2151 │ │ if len(error_msgs) > 0: │
│ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │
│ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │
│ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │
│ 2155 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Error(s) in loading state_dict for Mamba2DModel:
size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]).

tyshiwo1 · 2024-06-27T02:28:23Z

It seems that you load the weights of model trained on 256 \times 256 with the 512 \times 512 config. Can you offer me the config you used?

Since you get the message load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet_ema.pth, have you downloaded the checkpoint with the correct resolution ($256 \times 256$)?

lihao-doc · 2024-06-27T07:23:15Z

imagenet256_H_DiM.py
mport ml_collections

def d(**kwargs):
"""Helper of creating a config dict."""
return ml_collections.ConfigDict(initial_dictionary=kwargs)

def get_config():
config = ml_collections.ConfigDict()

config.seed = 1234
config.pred = 'noise_pred'
config.z_shape = (4, 32, 32)

config.autoencoder = d(
    pretrained_path='assets/stable-diffusion/autoencoder_kl_ema.pth'
)

# config.gradient_accumulation_steps=2 # 1
config.max_grad_norm = 1.0

config.train = d(
    n_steps=750000, # 300000
    batch_size=768, 
    mode='cond',
    log_interval=10,
    eval_interval=5000,
    save_interval=25000, # 50000
)

config.optimizer = d(
    name='adamw',
    lr=0.0002, 
    weight_decay=0.03, 
    betas=(0.99, 0.99),
    eps=1e-15,
)

config.lr_scheduler = d(
    name='customized',
    warmup_steps=5000, 
)

learned_sigma = False
latent_size = 32
in_channels = 4 # 3
config.nnet = d( 
    name='Mamba_DiT_H_2',
    attention_head_dim=1536//1, num_attention_heads=1, num_layers=49, 
    in_channels=in_channels,
    num_embeds_ada_norm=1000,
    sample_size=latent_size,
    activation_fn="gelu-approximate", #"gelu-approximate",
    attention_bias=True,
    norm_elementwise_affine=False,
    norm_type="ada_norm_single", #"layer_norm",
    out_channels=in_channels*2 if learned_sigma else in_channels,
    patch_size=2, 
    mamba_d_state=16,
    mamba_d_conv=3, 
    mamba_expand=2,
    use_bidirectional_rnn=False,
    mamba_type='enc',
    nested_order=0,
    is_uconnect=True,
    no_ff=True,
    use_conv1d=True,
    is_extra_tokens=True,
    rms=True, 
    use_pad_token=True,
    use_a4m_adapter=True,
    drop_path_rate=0.0, 
    encoder_start_blk_id=1, 
    kv_as_one_token_idx=-1,
    num_2d_enc_dec_layers=6,
    pad_token_schedules=['dec_split', 'lateral'],
    is_absorb=False, 
    use_adapter_modules=True,
    sequence_schedule='dilated',
    sub_sequence_schedule=['reverse_single', 'layerwise_cross'],
    pos_encoding_type='learnable', 
    scan_pattern_len=4 -1,
    is_align_exchange_q_kv=False, 
    is_random_patterns=False, 
) 
config.gradient_checkpointing = False

config.dataset = d(
    name='imagenet',
    path='assets/datasets/ImageNet',
    resolution=256,
    cfg=True,
    p_uncond=0.1,
)

config.sample = d(
    sample_steps=50,
    n_samples=50000,
    mini_batch_size=25,  # the decoder is large
    algorithm='dpm_solver',
    cfg=True,
    scale=0.4,
    path=''
)

return config

downloaded the checkpoint with https://drive.google.com/drive/folders/1TTEXKKhnJcEV9jeZbZYlXjiPyV87ZhE0?usp=sharing

lihao-doc · 2024-06-28T01:19:05Z

ImageNet 64x64: Put the standard ImageNet dataset (which contains the train and val directory) to assets/datasets/ImageNet.
ImageNet 256x256 and ImageNet 512x512: Extract ImageNet features according to scripts/extract_imagenet_feature.py.

Currently, I have downloaded the ImageNet dataset and placed it according to the prescribed path, but I have not processed it yet. Is it necessary to preprocess the dataset into a 256x256 format? Or does the program automatically handle the dataset formatting?

tyshiwo1 · 2024-06-28T02:10:25Z

There is no necessary to preprocess the datasets where images are smaller than $256 \times 256$. Although this requires additional training time and GPU memory, it should not be too much.
For images larger than $512 \times 512$, you can preprocess the datasets like this, which saves a lot of training cost.

tyshiwo1 · 2024-06-28T02:13:10Z

The path of the image samples in our imageNet dataset is like assets/datasets/ImageNet/train/n07747607/n07747607_61484.JPEG

tyshiwo1 · 2024-06-28T02:22:37Z

I'm sorry I accidentally hit the edit key on your reply of uploading the config.😂

After reading, I think your provided config is correct. However, your checkpoint should not contain additional_embed with shape torch.Size([1, 1026, 1536]). Have you really downloaded the correct checkpoint? You may try to load the nnet.pth to check whether the evaluation can successfully performed (using this checkpoint for evaluation would get a worse FiD).

Have you loaded the checkpoint correctly? Since others have succeeded, it may not be a problem with me. #8 (comment)

lihao-doc · 2024-06-28T02:42:09Z

CUDA_VISIBLE_DEVICES="0" python ./eval_ldm_discrete.py --config=configs/imagenet256_H_DiM.py --nnet_path='workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet.pth'
[2024-06-28 10:25:06,226] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible
I0628 10:25:07.460301 130068879759168 eval_ldm_discrete.py:140] Process 0 using device: cuda
Counting ImageNet files from assets/datasets/ImageNet
Finish counting ImageNet files
1000 classes
cnt[:10]: tensor([1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300., 1300.])
frac[:10]: [tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010), tensor(0.0010)]
prepare the dataset for classifier free guidance with p_uncond=0.1
2024-06-28 10:25:10,717 - _cpp_lib.py - WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.1.1)
Python 3.9.19 (you have 3.9.19)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
2024-06-28 10:25:26,171 - eval_ldm_discrete.py - load nnet from workdir/imagenet256_H_DiM/default/ckpts/425000.ckpt/nnet.pth
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:341 in │
│ │
│ 338 │
│ 339 │
│ 340 if name == "main": │
│ ❱ 341 │ app.run(main) │
│ 342 │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:308 in run │
│ │
│ 305 │ callback = _init_callbacks.popleft() │
│ 306 │ callback() │
│ 307 │ try: │
│ ❱ 308 │ _run_main(main, args) │
│ 309 │ except UsageError as error: │
│ 310 │ usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode) │
│ 311 │ except: │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/absl/app.py:254 in _run_main │
│ │
│ 251 │ atexit.register(profiler.print_stats) │
│ 252 │ sys.exit(profiler.runcall(main, argv)) │
│ 253 else: │
│ ❱ 254 │ sys.exit(main(argv)) │
│ 255 │
│ 256 │
│ 257 def call_exception_handlers(exception): │
│ │
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:337 in main │
│ │
│ 334 │ config = FLAGS.config │
│ 335 │ config.nnet_path = FLAGS.nnet_path │
│ 336 │ config.output_path = FLAGS.output_path │
│ ❱ 337 │ evaluate(config) │
│ 338 │
│ 339 │
│ 340 if name == "main": │
│ │
│ /home/lihao/DiM-DiffusionMamba/./eval_ldm_discrete.py:156 in evaluate │
│ │
│ 153 │ nnet = accelerator.prepare(nnet) │
│ 154 │ logging.info(f'load nnet from {config.nnet_path}') │
│ 155 │ if (config.nnet_path is not None) and (config.sample.algorithm != 'dpm_solver_upsamp │
│ ❱ 156 │ │ accelerator.unwrap_model(nnet).load_state_dict(torch.load(config.nnet_path, map │
│ 157 │ else: │
│ 158 │ │ accelerator.unwrap_model(nnet) │
│ 159 │
│ │
│ /home/lihao/anaconda3/envs/mamba-attn/lib/python3.9/site-packages/torch/nn/modules/module.py:215 │
│ 2 in load_state_dict │
│ │
│ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │
│ 2150 │ │ │
│ 2151 │ │ if len(error_msgs) > 0: │
│ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │
│ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │
│ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │
│ 2155 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Error(s) in loading state_dict for Mamba2DModel:
size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]).

Loading the nnet.pth still fails. Are you sure the model you uploaded is correct? I've noticed that the filenames for the 256-resolution and 512-resolution models are identical. The configuration provided by the other individual suggests they might have been using a model they trained themselves. Currently, I need to load the model that you trained.

lihao-doc · 2024-06-28T02:45:30Z

Could you please send me the trained model for 256 resolution?

tyshiwo1 · 2024-06-28T03:14:17Z

OK, I will upload my best 256 model later

lihao-doc · 2024-06-28T03:27:38Z

After successfully sending it, could you please provide me with a link or privately send a copy to my email address HaiLi086@163.com? I am highly interested in your work and would greatly appreciate it!

tyshiwo1 · 2024-06-28T03:32:08Z

Thank you for your appreciation!

I would upload it into this repo, and update this:

ImageNet 256x256 (Huge/2) | 2.21 | 625K | 768 -- | -- | -- | --

lihao-doc · 2024-06-28T03:35:00Z

I previously downloaded it from here: ImageNet 256x256 (Huge/2) 2.40 425K 768

tyshiwo1 · 2024-06-28T03:35:42Z

If you have not prepared your dataset well, you can modify this line of your config to

config.dataset = d(
        name='imagenet256_features',
        path='assets/datasets/imagenet256_features',
        cfg=True,
        p_uncond=0.1
    )

This setting requires NO prepared datasets for evaluation

tyshiwo1 · 2024-06-28T03:36:19Z

I previously downloaded it from here: ImageNet 256x256 (Huge/2) 2.40 425K 768

I know. I would give you a new link.

lihao-doc · 2024-06-28T03:44:33Z

How do I prepare the dataset? I'm unable to properly run the script file scripts/extract_imagenet_feature.py.

python scripts/extract_imagenet_feature.py
usage: extract_imagenet_feature.py [-h] path
extract_imagenet_feature.py: error: the following arguments are required: path

My dataset path is: /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet/train/n01440764/n01440764_18.JPEG. The images in my dataset have been downloaded but not processed further. How come there is an imagenet256_features folder?

tyshiwo1 · 2024-06-28T03:50:56Z

Here is the best 256 model: https://drive.google.com/drive/folders/1ETllUm8Dpd8-vDHefQEXEWF9whdbyhL5?usp=sharing

You can place the new checkpoint to the path ./workdir/imagenet256_H_mambaenc_pad_cross_conv_skip1_2scan_vaeema_ada_4scan/default/ckpts/625000.ckpt/.

Then, execute this ( I just tested it, and it works well ):

accelerate launch --multi_gpu --gpu_ids 0,1 --main_process_port 20039 --num_processes 2 --mixed_precision bf16 ./eval_ldm_discrete.py --config=configs/imagenet256_H_DiM.py --nnet_path='workdir/imagenet256_H_mambaenc_pad_cross_conv_skip1_2scan_vaeema_ada_4scan/default/ckpts/625000.ckpt/nnet_ema_256_625k.pth'

tyshiwo1 · 2024-06-28T03:55:26Z

How do I prepare the dataset? I'm unable to properly run the script file scripts/extract_imagenet_feature.py.

python scripts/extract_imagenet_feature.py usage: extract_imagenet_feature.py [-h] path extract_imagenet_feature.py: error: the following arguments are required: path

My dataset path is: /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet/train/n01440764/n01440764_18.JPEG. The images in my dataset have been downloaded but not processed further. How come there is an imagenet256_features folder?

First, I do not use the latent extraction for 256 features in the configs of this open source code.
Second, extract_imagenet_feature.py: error: the following arguments are required: path means you need to type a path like python scripts/extract_imagenet_feature.py /home/lihao/DiM-DiffusionMamba/assets/datasets/ImageNet

lihao-doc · 2024-06-29T13:07:51Z

Thank you for your meticulous guidance; I have resolved all of my issues.

tyshiwo1 · 2024-06-30T00:19:06Z

Thank you for your meticulous guidance; I have resolved all of my issues.

OK, I will close the issue.

tyshiwo1 closed this as completed Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

lihao-doc commented Jun 27, 2024

tyshiwo1 commented Jun 27, 2024 •

edited

Loading

lihao-doc commented Jun 27, 2024 •

edited by tyshiwo1

Loading

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024 •

edited

Loading

tyshiwo1 commented Jun 28, 2024 •

edited

Loading

lihao-doc commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 29, 2024

tyshiwo1 commented Jun 30, 2024

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

Comments

lihao-doc commented Jun 27, 2024

tyshiwo1 commented Jun 27, 2024 • edited Loading

lihao-doc commented Jun 27, 2024 • edited by tyshiwo1 Loading

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024 • edited Loading

tyshiwo1 commented Jun 28, 2024 • edited Loading

lihao-doc commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

tyshiwo1 commented Jun 28, 2024

lihao-doc commented Jun 29, 2024

tyshiwo1 commented Jun 30, 2024

tyshiwo1 commented Jun 27, 2024 •

edited

Loading

lihao-doc commented Jun 27, 2024 •

edited by tyshiwo1

Loading

tyshiwo1 commented Jun 28, 2024 •

edited

Loading

tyshiwo1 commented Jun 28, 2024 •

edited

Loading