Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta-llama/Llama-3.1-8B-Instruct generate output shows unexpected padding #1378

Closed
2 of 4 tasks
aslanxie opened this issue Sep 29, 2024 · 3 comments · Fixed by #1444
Closed
2 of 4 tasks

meta-llama/Llama-3.1-8B-Instruct generate output shows unexpected padding #1378

aslanxie opened this issue Sep 29, 2024 · 3 comments · Fixed by #1444
Labels
bug Something isn't working

Comments

@aslanxie
Copy link
Contributor

System Info

optimum-habana: v1.13.2
habanalabs-dkms/jammy 1.17.1-40
DOCKER_IMAGE=vault.habana.ai/gaudi-docker/1.17.1/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. clone and install optimum-habana
git clone https://github.com/huggingface/optimum-habana
cd optimum-habana && git checkout v1.13.2
pip install .
  1. move to examples/text-generation and run
    python3 run_generation.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --use_hpu_graphs --limit_hpu_graph --use_kv_cache --reuse_cache --trim_logits --attn_softmax_bf16 --max_input_tokens 512 --max_new_tokens 2048 --bf16 --batch_size 1 --warmup 0 --n_iterations 3

  2. The output looks like below. The flag '!' is unexpected padding in output:

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!DeepSpeed is a machine learning framework that provides a set of tools and libraries for scaling up deep learning models and training them on large datasets. It is designed to be highly efficient and scalable, allowing users to train large models on a single machine or distribute the training process across multiple machines.\n\nHere are some key features of DeepSpeed:\n\n1.  **Efficient Training**: DeepSpeed provides a set of techniques to optimize the training process, including gradient accumulation, mixed precision training, and model parallelism. These techniques can significantly reduce the training time and memory usage.\n2.  **Distributed Training** ...

Expected behavior

The expected output should be:

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('DeepSpeed is a machine learning framework that provides a set of tools and libraries for scaling up deep learning models and training them on large datasets. It is designed to be highly efficient and scalable ...
@aslanxie aslanxie added the bug Something isn't working label Sep 29, 2024
@aslanxie
Copy link
Contributor Author

From llama3, the bos/eos token id are changed, for example Llama-3.1-8B-Instruct:

 "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],

In text-generation example, it force model.generation_config.pad_token_id = 0, and token id 0 represents '!' in meta-llama/Llama-3.1-8B-Instruct tokenizer table. So, it looks like token id mismatch.

@regisss
Copy link
Collaborator

regisss commented Oct 20, 2024

@aslanxie This should have been fixed by #1444 that I just merged into main.
Can you try again on the main branch and let me know if that works on your side too?

@aslanxie
Copy link
Contributor Author

@regisss It's working on v1.14.0 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants