Can this package support the one-gpu machine #206

momo1986 · 2023-05-31T15:35:09Z

Hi, dear guys of tutelage team.

I have run the script and do some small modifications.
python -u main_moe.py --cfg configs/swinmoe/swin_moe_small_patch4_window12_192_32expert_32gpu_22k.yaml --data-path /data/user1/junyan/datasets/ImageNet/ImageNet_Val --batch-size 128 --resume checkpoints/swin_moe_small_patch4_window12_192_32expert_32gpu_22k/swin_moe_small_patch4_window12_192_32expert_32gpu_22k.pth

However, I have received the error message:

File "main_moe.py", line 374, in
main(config)
File "main_moe.py", line 141, in main
max_accuracy = load_checkpoint(config, model_without_ddp, optimizer, lr_scheduler, loss_scaler, logger)
File "/data/user1/junyan/adv_training/Swin-Transformer/utils_moe.py", line 45, in load_checkpoint
msg = model.load_state_dict(checkpoint['model'], strict=False)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1039, in load_state_dict
load(self)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1037, in load
load(child, prefix + name + '.')
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1037, in load
load(child, prefix + name + '.')
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1037, in load
load(child, prefix + name + '.')
[Previous line repeated 3 more times]
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1034, in load
state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs)
File "/root/.local/lib/python3.6/site-packages/tutel/impls/moe_layer.py", line 54, in _load_from_state_dict
assert buff_name in state_dict, "Could not find parameter %s in state_dict." % buff_name
AssertionError: Could not find parameter layers.2.blocks.1.mlp._moe_layer.experts.batched_fc2_bias in state_dict.

I have only one gpu. I am not sure whether multiple gpus are essential for this task. Is there a possibility to run it on one gpu? Furthermore, how can I resolve this problem of error?

I am looking forward to your response.

Thanks a lot.

Best Regards!

The text was updated successfully, but these errors were encountered:

momo1986 · 2023-06-01T01:26:01Z

Thanks for your kind comment.

ghostplant · 2023-06-01T03:40:57Z

One GPU per machine? Can you explain how many machines you'd like to run it? Or you just want to run it using 1 GPU on 1 machine?

momo1986 · 2023-06-01T08:51:13Z

Hi, dear guys, @ghostplant.

I have several different one-gpu machines. To save the computation resource, running the program in a one-gpu machine would be economical for me. Actually, I mainly study some specific properties of MOE. Therefor, if it is OK, as you mentioned, just want to run it using 1 GPU on 1 machine.

ghostplant · 2023-06-01T11:22:27Z

If you run it with a one-gpu machine, seems like you need to ensure this GPU memory size is enough to store all 32-expert parameters. The way to convert swin_moe_small_patch4_window12_192_32expert_32gpu_22k.pth to single-gpu can follow the utility here, where the second example is to merge 32 different checkpoint files into a single checkpoint file and it will be compatible for single gpu to load.

momo1986 · 2023-06-02T02:56:45Z

Hi, @ghostplant. Thanks for your guidance. Can this package support run a single-gpu machine to test ImageNet? The user should implement this program manually, or is there a relevant demo?
Thanks & Regards!
Momo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this package support the one-gpu machine #206

Can this package support the one-gpu machine #206

momo1986 commented May 31, 2023

momo1986 commented Jun 1, 2023

ghostplant commented Jun 1, 2023 •

edited

Loading

momo1986 commented Jun 1, 2023

ghostplant commented Jun 1, 2023

momo1986 commented Jun 2, 2023

Can this package support the one-gpu machine #206

Can this package support the one-gpu machine #206

Comments

momo1986 commented May 31, 2023

momo1986 commented Jun 1, 2023

ghostplant commented Jun 1, 2023 • edited Loading

momo1986 commented Jun 1, 2023

ghostplant commented Jun 1, 2023

momo1986 commented Jun 2, 2023

ghostplant commented Jun 1, 2023 •

edited

Loading