[`LlamaSlowConverter`] Slow to Fast better support #29797

ArthurZucker · 2024-03-22T01:27:02Z

What does this PR do?

Makes sur that the eos token, bos token and unk token are correctly taken when initializing from slow.

HuggingFaceDocBuilderDev · 2024-03-22T01:46:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…converter

amyeroberts

Thanks for fixing!

amyeroberts · 2024-03-28T13:58:50Z

src/transformers/convert_slow_tokenizer.py

+            (self.original_tokenizer.convert_ids_to_tokens(0), 0.0),
+            (self.original_tokenizer.convert_ids_to_tokens(1), 0.0),
+            (self.original_tokenizer.convert_ids_to_tokens(2), 0.0),


I'm assuming having the mapping of ids to tokens here is fine because we're always handling Llama in this case?

src/transformers/convert_slow_tokenizer.py

* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py

fix

e067309

ArthurZucker mentioned this pull request Mar 22, 2024

Added support for building an AddedVocabulary based on a pre-existing AddedVocabulary. huggingface/tokenizers#1444

Closed

fix test

12f76f2

ArthurZucker added 3 commits March 22, 2024 10:46

style

f5b1349

nit

d825b27

rather rely on concert token to id

938999b

NielsRogge mentioned this pull request Mar 24, 2024

LLaVA image_token_index is not 64000 but 64002 #29836

Closed

4 tasks

ArthurZucker mentioned this pull request Mar 25, 2024

Tokenizer use_fast=True encode has fatal bug #29483

Closed

2 tasks

Merge branch 'main' of github.com:huggingface/transformers into llam-…

1c840c2

…converter

ArthurZucker requested a review from amyeroberts March 28, 2024 13:03

fix quality

12d27b7

amyeroberts approved these changes Mar 28, 2024

View reviewed changes

ArthurZucker commented Mar 28, 2024

View reviewed changes

src/transformers/convert_slow_tokenizer.py Outdated Show resolved Hide resolved

Update src/transformers/convert_slow_tokenizer.py

d88dfcf

ArthurZucker merged commit 536ea2a into main Mar 28, 2024
21 checks passed

ArthurZucker deleted the llam-converter branch March 28, 2024 15:19

amyeroberts pushed a commit that referenced this pull request Mar 28, 2024

[LlamaSlowConverter] Slow to Fast better support (#29797)

e40fe39

* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py

itazap pushed a commit that referenced this pull request May 14, 2024

[LlamaSlowConverter] Slow to Fast better support (#29797)

72fc614

* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`LlamaSlowConverter`] Slow to Fast better support #29797

[`LlamaSlowConverter`] Slow to Fast better support #29797

ArthurZucker commented Mar 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 22, 2024

amyeroberts left a comment

amyeroberts Mar 28, 2024

[LlamaSlowConverter] Slow to Fast better support #29797

[LlamaSlowConverter] Slow to Fast better support #29797

Conversation

ArthurZucker commented Mar 22, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Mar 28, 2024

Choose a reason for hiding this comment

[`LlamaSlowConverter`] Slow to Fast better support #29797

[`LlamaSlowConverter`] Slow to Fast better support #29797

ArthurZucker commented Mar 22, 2024 •

edited

Loading