Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setup_chat_format for adding new special tokens to model for training chat models #1242

Merged
merged 15 commits into from
Jan 18, 2024

Conversation

philschmid
Copy link
Member

@philschmid philschmid commented Jan 17, 2024

What does this PR do?

This PR adds a new util method setup_chat_format, which automatically defines the chat_template for a tokenizer, adds special tokens, resizes the model embedding layer (optional to a multiple of 64)

It also introduces the ChatMlSpecialTokens dataclass, which is used in the setup_chat_format. This will make it easy to extend to different formats in the future, e.g., llama, but for now, we only add chatml.

Open Discussions

Should we add more dummy tokens to the tokenizer when the embedding layer is extended to a multiple of x? This can lead to downstream issues with llama.cpp

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot ! I left some early comments, in addition can you:

tests/test_model_utils.py Outdated Show resolved Hide resolved
trl/models/utils.py Show resolved Hide resolved
trl/models/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot !

docs/source/sft_trainer.mdx Outdated Show resolved Hide resolved
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
@younesbelkada younesbelkada merged commit 928d144 into huggingface:main Jan 18, 2024
9 checks passed
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
…aining chat models (huggingface#1242)

* first draft

* 64

* sourabs suggestion

* wip tests

* make style happy

* add check

* docstring

* fix docstring

* Update tests/test_model_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* move tests

* add todo for abstract class

* make style happy

* add slow tests and imports

* add documentation

* sft_trainer.mdx aktualisieren

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants