newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

Nazzaroth2 · 2024-08-17T05:31:58Z

I just tried to start a Mistral Nemo 12B Base training run on a rtx 4090 cloud service.
I am running pytorch_2.3.0-cuda12.1-cudnn8-devel with the newest versions of unsloth and these packages from the colab notebooks:
!pip install --no-deps "xformers<0.0.27" trl peft accelerate bitsandbytes

When the trainer starts this error comes out:

TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 trainer_stats = trainer.train()

File <string>:46, in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:837, in fix_untrained_tokens(model, tokenizer, train_dataset, eps)
    835 lm_head_where = torch.where(indicator_untrained1)[0]
    836 lm_head_bad = lm_head_matrix[lm_head_where]
--> 837 lm_head_bad = lm_head_bad.cpu().numpy().round(3)
    838 from collections import Counter
    839 counter = Counter()

TypeError: Got unsupported ScalarType BFloat16

Not quite sure what went wrong but I read in another github repo that numpy is not playing nice with bfloat16?
Could be outdated though, was from Sep. 2023

The text was updated successfully, but these errors were encountered:

Animadversio · 2024-08-17T06:12:51Z

I got the same issue.
I made an ad hoc hot fix that works for myself.
Change line 837 to

lm_head_bad = lm_head_bad.cpu().float().numpy().round(3)

lm_head_bad is a local variable for hashing so this won't affect anything outside the function.
This works perfectly for me.

* untrained tokens llama 3.1 base * Update tokenizer_utils.py * Update tokenizer_utils.py

danielhanchen · 2024-08-17T06:40:02Z

Apologies just fixed! Weird on numpy not liking it - yep the hotfix is correct!

danielhanchen added a commit that referenced this issue Aug 17, 2024

Bug #930 (#931)

52bc19d

* untrained tokens llama 3.1 base * Update tokenizer_utils.py * Update tokenizer_utils.py

danielhanchen added the fixed - pending confirmation Fixed, waiting for confirmation from poster label Aug 17, 2024

Nazzaroth2 closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

Nazzaroth2 commented Aug 17, 2024

Animadversio commented Aug 17, 2024

danielhanchen commented Aug 17, 2024

newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

Comments

Nazzaroth2 commented Aug 17, 2024

Animadversio commented Aug 17, 2024

danielhanchen commented Aug 17, 2024