You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: distributed yes but I test with two custom yml files (see below)
Using GPU in script?: yes
GPU type: NVIDIA A100-SXM4-80GB
Both accelerate and transformers are all recent, installed fresh from github.
Who can help?
@ArthurZucker@muellerz just because it seems to be something to do with the combination of fsdp + the instantiation of the tokenizer classes
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I seem to be getting a weird issue when using multi-GPU with loading certain models from transformers. In the toy task below I am simply loading in some tokenizers and text encoders from a certain pretrained model, and yet oddly enough, when I am running this script under multi-GPU + FSDP I am getting NaNs in the text encoder.
For instance, with this script:
from accelerate import Accelerator
from transformers import CLIPTokenizer, T5EncoderModel, T5TokenizerFast, CLIPTextModel
from diffusers.utils import (
check_min_version
)
import torch
def has_nan(tensor):
if not isinstance(tensor, torch.Tensor):
return f"not a tensor, but a {type(tensor)}"
return torch.isnan(tensor).any().item()
def check_nan_weights(model, mod_name):
nan_params = []
for name, param in model.named_parameters():
if torch.isnan(param.data).any():
nan_params.append(name)
if nan_params:
print(f"[{torch.cuda.current_device()}, {mod_name}]: NaN weights detected in the following parameters:")
for param_name in nan_params:
print(f" - {param_name}")
return True
return False
from logging import getLogger
logger = getLogger(__name__)
def load_pipeline(accelerator,
pretrained_model_name_or_path: str,
load_tokenizers: bool = True,
revision: str = None,
variant: str = None):
#with accelerator.main_process_first():
if load_tokenizers:
# Load the tokenizers
tokenizer_one = CLIPTokenizer.from_pretrained(
pretrained_model_name_or_path,
subfolder="tokenizer",
revision=revision,
)
tokenizer_two = T5TokenizerFast.from_pretrained(
pretrained_model_name_or_path,
subfolder="tokenizer_2",
revision=revision,
)
#accelerator.wait_for_everyone()
text_encoder_one = CLIPTextModel.from_pretrained(
pretrained_model_name_or_path, subfolder="text_encoder",
revision=revision, variant=variant
)
text_encoder_two = T5EncoderModel.from_pretrained(
pretrained_model_name_or_path, subfolder="text_encoder_2",
revision=revision, variant=variant,
)
logger.info("check nan weights...")
check_nan_weights(text_encoder_one, 'te')
check_nan_weights(text_encoder_two, 'te2')
def main():
accelerator = Accelerator()
pipeline = load_pipeline(
accelerator,
"black-forest-labs/FLUX.1-dev",
load_tokenizers=True
)
if __name__ == "__main__":
#from torch.multiprocessing import Pool, Process, set_start_method
#set_start_method('spawn')
main()
If we run this with 1 gpu via accelerate launch --config_file 1gpu.yml test.py we get no errors. However, with 2 gpu with accelerate launch --config_file 2gpu.yml test.py we get:
[1, te]: NaN weights detected in the following parameters:
- text_model.encoder.layers.0.self_attn.k_proj.weight
- text_model.encoder.layers.0.self_attn.k_proj.bias
- text_model.encoder.layers.0.self_attn.v_proj.weight
- text_model.encoder.layers.2.self_attn.out_proj.bias
- text_model.encoder.layers.2.layer_norm1.weight
- text_model.encoder.layers.2.layer_norm1.bias
...
...
Note that if we set load_tokenizers=False in load_pipeline, there are no issues. It seems to be something with the tokenizer. I thought this might be some race-condition related issue but when I tried to isolate that behaviour with e.g. the use of accelerator.wait_for_everyone() I still got the same issues.
Furthermore, if I just run the script with accelerate launch test.py with a default config (one which is as vanilla as can be, no FSDP and just enabling multi-GPU) then there are no errors to be found. So this seems to be an issue specifically at the intersection of FSDP and the tokenizer classes.
My accelerate config files are as follows for 1 gpu and 2 gpu (for 2 gpu just set num_processes: 2 of course).
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
fsdp_activation_checkpointing: false
fsdp_auto_wrap_policy: SIZE_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_min_num_params: 100000000
fsdp_offload_params: false
# SHARD_GRAD_OP was the previous strat
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: FULL_STATE_DICT
# SHARDED_STATE_DICT was the old value for above
fsdp_sync_module_states: true
fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Expected behavior
No NaNs.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.45.0.dev0Both accelerate and transformers are all recent, installed fresh from github.
Who can help?
@ArthurZucker @muellerz just because it seems to be something to do with the combination of fsdp + the instantiation of the tokenizer classes
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I seem to be getting a weird issue when using multi-GPU with loading certain models from
transformers
. In the toy task below I am simply loading in some tokenizers and text encoders from a certain pretrained model, and yet oddly enough, when I am running this script under multi-GPU + FSDP I am getting NaNs in the text encoder.For instance, with this script:
If we run this with 1 gpu via
accelerate launch --config_file 1gpu.yml test.py
we get no errors. However, with 2 gpu withaccelerate launch --config_file 2gpu.yml test.py
we get:Note that if we set
load_tokenizers=False
inload_pipeline
, there are no issues. It seems to be something with the tokenizer. I thought this might be some race-condition related issue but when I tried to isolate that behaviour with e.g. the use ofaccelerator.wait_for_everyone()
I still got the same issues.Furthermore, if I just run the script with
accelerate launch test.py
with a default config (one which is as vanilla as can be, no FSDP and just enabling multi-GPU) then there are no errors to be found. So this seems to be an issue specifically at the intersection of FSDP and the tokenizer classes.My accelerate config files are as follows for 1 gpu and 2 gpu (for 2 gpu just set
num_processes: 2
of course).Expected behavior
No NaNs.
The text was updated successfully, but these errors were encountered: