Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newest commit (untrained tokens llame 3.1 base) creates bfloat16 issue with Mistral Nemo training #930

Closed
Nazzaroth2 opened this issue Aug 17, 2024 · 2 comments
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster

Comments

@Nazzaroth2
Copy link

I just tried to start a Mistral Nemo 12B Base training run on a rtx 4090 cloud service.
I am running pytorch_2.3.0-cuda12.1-cudnn8-devel with the newest versions of unsloth and these packages from the colab notebooks:
!pip install --no-deps "xformers<0.0.27" trl peft accelerate bitsandbytes

When the trainer starts this error comes out:

TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 trainer_stats = trainer.train()

File <string>:46, in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:837, in fix_untrained_tokens(model, tokenizer, train_dataset, eps)
    835 lm_head_where = torch.where(indicator_untrained1)[0]
    836 lm_head_bad = lm_head_matrix[lm_head_where]
--> 837 lm_head_bad = lm_head_bad.cpu().numpy().round(3)
    838 from collections import Counter
    839 counter = Counter()

TypeError: Got unsupported ScalarType BFloat16

Not quite sure what went wrong but I read in another github repo that numpy is not playing nice with bfloat16?
Could be outdated though, was from Sep. 2023

@Animadversio
Copy link

I got the same issue.
I made an ad hoc hot fix that works for myself.
Change line 837 to

lm_head_bad = lm_head_bad.cpu().float().numpy().round(3)

lm_head_bad is a local variable for hashing so this won't affect anything outside the function.
This works perfectly for me.

danielhanchen added a commit that referenced this issue Aug 17, 2024
* untrained tokens llama 3.1 base

* Update tokenizer_utils.py

* Update tokenizer_utils.py
@danielhanchen
Copy link
Contributor

Apologies just fixed! Weird on numpy not liking it - yep the hotfix is correct!

@danielhanchen danielhanchen added the fixed - pending confirmation Fixed, waiting for confirmation from poster label Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster
Projects
None yet
Development

No branches or pull requests

3 participants