You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The llama3.1 can naturally be supported by the training, evaluation, and deployment modules of Align-Anything. However, according to our tests, due to some issues with the current transformers, it is temporarily unable to support deepspeed's ZeRO3 training. Our developers have reported this issue to the transformers community, and we have received a clear response and will continue to follow up.
This bug may affect the training of other types of models. Currently, if you need to use a stable version for training, you can temporarily use transformers version 4.41.2.
If you want to fine-tune llama3.1, we have verified that using ZeRO 2 can be implemented without errors in the latest 4.43.0 version of transformers.
Reproducible example code
importtorchimportdeepspeedimportjsonfromtransformersimport (
AutoModelForCausalLM,
AutoTokenizer
)
fromtransformers.integrations.deepspeedimportHfDeepSpeedConfigDEFAULT_BOS_TOKEN: str='<s>'DEFAULT_EOS_TOKEN: str='</s>'DEFAULT_PAD_TOKEN: str='<pad>'DEFAULT_UNK_TOKEN: str='<unk>'model_name_or_path='PATHTO/Llama-3.1'ds_cfgs_path='PATH'deepspeed.init_distributed()
withopen(ds_cfgs_path) asf:
ds_cfgs=json.load(f)
ds_cfgs['bf16']['enabled'] =Truedstchf=HfDeepSpeedConfig(ds_cfgs)
tokenizer=AutoTokenizer.from_pretrained(
model_name_or_path,
model_max_length=2048,
padding_side='right',
trust_remote_code=True,
)
model=AutoModelForCausalLM.from_pretrained(
model_name_or_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
# Reference: https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.pydefresize_tokenizer_embedding(tokenizer, model) ->None:
"""Resize tokenizer and embedding. Note: This is the unoptimized version that may make your embedding size not be divisible by 64. """definit_new_embeddings(
embeddings,
new_num_embeddings: int,
num_new_embeddings: int,
) ->None:
ifembeddingsisNone:
returnparams= [embeddings.weight]
print(hasattr(embeddings.weight, 'ds_id'))
# True for transformers 4.43.1, False for transformers 4.41.2exit()
context= (
deepspeed.zero.GatheredParameters(params, modifier_rank=0)
ifis_deepspeed_zero3_enabled()
elsecontextlib.nullcontext()
)
withcontext:
forparaminparams:
ifparamisNone:
continueassertparam.size(0) ==new_num_embeddings, f'{param.size(0)}, {new_num_embeddings}'# bug here, param size is 32000 while new_num_embeddings is 32001param_data=param.dataparam_mean=param_data[:-num_new_embeddings].mean(dim=0, keepdim=True)
param_data[-num_new_embeddings:] =param_meanspecial_tokens_dict= {}
iftokenizer.pad_tokenisNone:
special_tokens_dict['pad_token'] =DEFAULT_PAD_TOKENiftokenizer.eos_tokenisNone:
special_tokens_dict['eos_token'] =DEFAULT_EOS_TOKENiftokenizer.bos_tokenisNone:
special_tokens_dict['bos_token'] =DEFAULT_BOS_TOKENiftokenizer.unk_tokenisNone:
special_tokens_dict['unk_token'] =DEFAULT_UNK_TOKENnum_new_tokens=tokenizer.add_special_tokens(special_tokens_dict)
new_num_embeddings=len(tokenizer)
model.config.bos_token_id=tokenizer.bos_token_idmodel.config.eos_token_id=tokenizer.eos_token_idmodel.config.pad_token_id=tokenizer.pad_token_idifnum_new_tokens>0:
hf_device_map=getattr(model, 'hf_device_map', {})
devices= {
torch.device(device)
fordeviceinhf_device_map.values()
ifdevicenotin {'cpu', 'disk'}
}
is_model_parallel=len(devices) >1ifnotis_model_parallel:
model.resize_token_embeddings(new_num_embeddings)
init_new_embeddings(
model.get_input_embeddings(),
new_num_embeddings=new_num_embeddings,
num_new_embeddings=num_new_tokens,
)
init_new_embeddings(
model.get_output_embeddings(),
new_num_embeddings=new_num_embeddings,
num_new_embeddings=num_new_tokens,
)
resize_tokenizer_embedding(tokenizer=tokenizer, model=model)
Traceback
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
What version of align-anything are you using?
0.1.0-dev
System information
transformers
version: 4.43.1Problem description
The llama3.1 can naturally be supported by the training, evaluation, and deployment modules of Align-Anything. However, according to our tests, due to some issues with the current transformers, it is temporarily unable to support deepspeed's ZeRO3 training. Our developers have reported this issue to the transformers community, and we have received a clear response and will continue to follow up.
This bug may affect the training of other types of models. Currently, if you need to use a stable version for training, you can temporarily use transformers version 4.41.2.
If you want to fine-tune llama3.1, we have verified that using ZeRO 2 can be implemented without errors in the latest 4.43.0 version of transformers.
Reproducible example code
Traceback
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: