-
Notifications
You must be signed in to change notification settings - Fork 31k
🚨 🚨 Allow saving and loading multiple "raw" chat template files #36588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 🚨 Allow saving and loading multiple "raw" chat template files #36588
Conversation
199d69f to
1df676d
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
344e49e to
0473ebb
Compare
|
Failing tests are unrelated, so cc @ArthurZucker @Cyrilvallez for core maintainer review. I realize this is probably a lower priority than some other stuff right now, though, so don't stress! |
af3f6e4 to
6a6528b
Compare
f05c52a to
d4bba4a
Compare
|
cc @zucchini-nlp this should be just about ready for review now - can you take a look at the processor sections before I ping the core maintainers? |
|
Also cc @Wauplin to check the Hub downloading code! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping! Logic looks good to me. I left some comments mostly about code style but feel free to ignore some if not relevant.
(Note! I did not check the tests)
src/transformers/processing_utils.py
Outdated
| # Legacy format for multiple templates: | ||
| # chat template dicts are saved to chat_template.json as lists of dicts with fixed key names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this format already in use? (multiple template, single JSON file?).
Asking as I can't see it in the removed code of this PR. If it was not possible before I'd tend to think it shouldn't be added (better to raise an exception suggesting the user to pass save_raw_chat_template=True instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I blocked saving multiple templates in the legacy format (while still supporting it in the modern format)
src/transformers/processing_utils.py
Outdated
| if isinstance(chat_templates, (list, tuple)): | ||
| # Un-flatten the list storage | ||
| chat_templates = {template["name"]: template["template"] for template in chat_templates} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be avoided if we decide not to support this case (i.e. if it's not already supported)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed!
| if is_local: | ||
| template_dir = Path(pretrained_model_name_or_path, CHAT_TEMPLATE_DIR) | ||
| if template_dir.is_dir(): | ||
| for template_file in template_dir.glob("*.jinja"): | ||
| template_name = template_file.name.removesuffix(".jinja") | ||
| vocab_files[f"chat_template_{template_name}"] = ( | ||
| f"{CHAT_TEMPLATE_DIR}/{template_file.name}" | ||
| ) | ||
| else: | ||
| for template in list_repo_templates( | ||
| pretrained_model_name_or_path, | ||
| local_files_only=local_files_only, | ||
| revision=revision, | ||
| cache_dir=cache_dir, | ||
| ): | ||
| vocab_files[f"chat_template_{template}"] = f"{CHAT_TEMPLATE_DIR}/{template}.jinja" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could be an helper method by itself (almost duplicated code compared to above logic in get_processor_dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on all comments from @Wauplin regarding the processor templates. Also, we need to keep an eye on models that need to support templates for both: tokenizer and processor. Up to now, I haven't seen models using different templates for that, but who knows
src/transformers/processing_utils.py
Outdated
| # Legacy format for multiple templates: | ||
| # chat template dicts are saved to chat_template.json as lists of dicts with fixed key names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future
| if isinstance(self.chat_template, dict) and "default" in self.chat_template: | ||
| chat_template = self.chat_template["default"] | ||
| elif isinstance(self.chat_template, dict): | ||
| raise ValueError( | ||
| 'The processor has multiple chat templates but none of them are named "default". You need to specify' | ||
| ' which one to use by passing the `chat_template` argument. Available templates are: ' | ||
| f'{", ".join(self.chat_template.keys())}' | ||
| ) | ||
| elif self.chat_template is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i see support is added here as well! Makes sense then, I am just wondering if it is really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was something I was thinking about a lot! However, what I noticed was that a lot of template authors started writing separate "normal" and "reasoning" templates, in addition to other template types like "rag" and "tool_use" that we also saw.
I felt like deleting support for multiple templates would cause problems for us as a result, so I decided to keep it, and add support in processors as well as tokenizers.
| # When we save as single files, tokenizers and processors share a chat template, which means | ||
| # the reloaded tokenizer should get the chat template as well | ||
| self.assertEqual(reloaded_processor.chat_template, reloaded_processor.tokenizer.chat_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point about sharing. We started suppoting models to be loaded either as multimodal or text-only from the same repo, for example Gemma3
AFAIK in gemma3 we have same templates in tokenizer and in processor, but I'd need to make sure. Also I think this will become a recurring pattern. For now seems both templates are same, makes sense since it is the same LM under the hood. Only diff is that template should support content: {str} and content: {list of dicts} format
Should not be problem for now, but good to keep an eye on new models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure - templates might have to be written to support either content schema in those cases, so they work for both tokenizers and processors. We should definitely watch closely as new models are added.
862010b to
7f144a0
Compare
src/transformers/processing_utils.py
Outdated
| with open(output_chat_template_file, "w", encoding="utf-8") as writer: | ||
| writer.write(chat_template_json_string) | ||
| logger.info(f"chat template saved in {output_chat_template_file}") | ||
| save_as_jinja = kwargs.get("save_raw_chat_template", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| save_as_jinja = kwargs.get("save_raw_chat_template", False) | |
| save_as_jinja = kwargs.get("save_chat_template_as_jinja", False) |
related to the output_chat_template_file_jinja/output_chat_template_file_legacy naming above I think the name of the kwarg could be more explicit (not 100% happy with my suggestion though...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used save_jinja_files just because it's shorter, and I think it will make sense to people. save_raw_chat_template was definitely confusing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a huggingface_hub's usage perspective!
(still better to wait for official approval from transformers's team 😄)
486d57d to
31f545b
Compare
31f545b to
6e152ab
Compare
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
63978d7 to
23c50c5
Compare
|
Tests are green so merging, including making this the default new save format! I'll keep an eye out for bugs/issues, we have time before the next release to smooth things over. |
|
Congrats for getting it to the finish line @Rocketknight1 ! |
…ingface#36588) * Add saving in the new format (but no loading yet!) * Add saving in the new format (but no loading yet!) * A new approach to template files! * make fixup * make fixup, set correct dir * Some progress but need to rework for cached_file * Rework loading handling again * Small fixes * Looks like it's working now! * make fixup * Working! * make fixup * make fixup * Add TODO so I don't miss it * Cleaner control flow with one less indent * Copy the new logic to processing_utils as well * Proper support for dicts of templates * make fixup * define the file/dir names in a single place * Update the processor chat template reload test as well * Add processor loading of multiple templates * Flatten correctly to match tokenizers * Better support when files are empty sometimes * Stop creating those empty templates * Revert changes now we don't have empty templates * Revert changes now we don't have empty templates * Don't support separate template files on the legacy path * Rework/simplify loading code * Make sure it's always a chat_template key in chat_template.json * Update processor handling of multiple templates * Add a full save-loading test to the tokenizer tests as well * Correct un-flattening * New test was incorrect * Correct error/offline handling * Better exception handling * More error handling cleanup * Add skips for test failing on main * Reorder to fix errors * make fixup * clarify legacy processor file docs and location * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Rename to _jinja and _legacy * Stop saving multiple templates in the legacy format * Cleanup the processing code * Cleanup the processing code more * make fixup * make fixup * correct reformatting * Use correct dir name * Fix import location * Use save_jinja_files instead of save_raw_chat_template_files * Correct the test for saving multiple processor templates * Fix type hint * Update src/transformers/utils/hub.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Patch llava_onevision test * Update src/transformers/processing_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Refactor chat template saving out into a separate function * Update tests for the new default * Don't do chat template saving logic when chat template isn't there * Ensure save_jinja_files is propagated to tokenizer correctly * Trigger tests * Update more tests to new default * Trigger tests --------- Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co>
…ingface#36588) * Add saving in the new format (but no loading yet!) * Add saving in the new format (but no loading yet!) * A new approach to template files! * make fixup * make fixup, set correct dir * Some progress but need to rework for cached_file * Rework loading handling again * Small fixes * Looks like it's working now! * make fixup * Working! * make fixup * make fixup * Add TODO so I don't miss it * Cleaner control flow with one less indent * Copy the new logic to processing_utils as well * Proper support for dicts of templates * make fixup * define the file/dir names in a single place * Update the processor chat template reload test as well * Add processor loading of multiple templates * Flatten correctly to match tokenizers * Better support when files are empty sometimes * Stop creating those empty templates * Revert changes now we don't have empty templates * Revert changes now we don't have empty templates * Don't support separate template files on the legacy path * Rework/simplify loading code * Make sure it's always a chat_template key in chat_template.json * Update processor handling of multiple templates * Add a full save-loading test to the tokenizer tests as well * Correct un-flattening * New test was incorrect * Correct error/offline handling * Better exception handling * More error handling cleanup * Add skips for test failing on main * Reorder to fix errors * make fixup * clarify legacy processor file docs and location * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Rename to _jinja and _legacy * Stop saving multiple templates in the legacy format * Cleanup the processing code * Cleanup the processing code more * make fixup * make fixup * correct reformatting * Use correct dir name * Fix import location * Use save_jinja_files instead of save_raw_chat_template_files * Correct the test for saving multiple processor templates * Fix type hint * Update src/transformers/utils/hub.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Patch llava_onevision test * Update src/transformers/processing_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Refactor chat template saving out into a separate function * Update tests for the new default * Don't do chat template saving logic when chat template isn't there * Ensure save_jinja_files is propagated to tokenizer correctly * Trigger tests * Update more tests to new default * Trigger tests --------- Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co>
Extremely draft PR for now, but the idea here is that even when a model has multiple templates, we can save them in a
templates/directory, and finally no longer have any cases where templates have to be saved as single JSON-encoded lines.