-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Decoding special tokens in T5 #8938
Comments
If you upload your transformers model somewhere I can take a look. |
@fairydreaming Sure, thanks. Base model: https://huggingface.co/repetitio/flan-t5-small LoRA: https://huggingface.co/repetitio/distilled-simplifier Any of the gguf variants give the same result. The input is just any wikipedia-like paragraph. |
The problem is that the current T5 model implementation ignores LORA in attention matrices. But fixing this is easy, try this patch:
Basically you have to replace every multiplication of a matrix
|
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
@fairydreaming Sorry, could you share the command you used, please? I'm not able to reproduce your results. I'm running
|
Sure thing:
I reconverted both ggufs from safetensors:
With these I get the same layer output values as in transformers. The output is:
|
Awesome, thanks! |
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
What happened?
I have a T5/lora model trained to output some text separated by the
<extra_id_0>
special token (the tokenizer properly works after following instructions in #8872) .When running the model using Huggingface's transformers/peft, it generates the expected output. However, when I use
llama-cli
, what happens instead is that the moment the first such token is reached, it's actually decoded into anEOG
token instead of the extra token and generation is stopped.I might be simply doing something wrong in using the library.
Name and Version
version: 3549 (afd27f0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: