🚨 🚨 Allow saving and loading multiple "raw" chat template files #36588

Rocketknight1 · 2025-03-06T15:18:09Z

Extremely draft PR for now, but the idea here is that even when a model has multiple templates, we can save them in a templates/ directory, and finally no longer have any cases where templates have to be saved as single JSON-encoded lines.

HuggingFaceDocBuilderDev · 2025-03-06T16:31:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-03-12T18:41:42Z

Failing tests are unrelated, so cc @ArthurZucker @Cyrilvallez for core maintainer review. I realize this is probably a lower priority than some other stuff right now, though, so don't stress!

Rocketknight1 · 2025-03-24T19:55:24Z

cc @zucchini-nlp this should be just about ready for review now - can you take a look at the processor sections before I ping the core maintainers?

Rocketknight1 · 2025-03-24T20:01:59Z

Also cc @Wauplin to check the Hub downloading code!

Wauplin

Thanks for the ping! Logic looks good to me. I left some comments mostly about code style but feel free to ignore some if not relevant.

(Note! I did not check the tests)

src/transformers/processing_utils.py

Wauplin · 2025-03-25T16:55:19Z

src/transformers/processing_utils.py

+            # Legacy format for multiple templates:
+            # chat template dicts are saved to chat_template.json as lists of dicts with fixed key names.


Is this format already in use? (multiple template, single JSON file?).

Asking as I can't see it in the removed code of this PR. If it was not possible before I'd tend to think it shouldn't be added (better to raise an exception suggesting the user to pass save_raw_chat_template=True instead).

+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future

Done! I blocked saving multiple templates in the legacy format (while still supporting it in the modern format)

src/transformers/processing_utils.py

Wauplin · 2025-03-25T17:02:01Z

src/transformers/processing_utils.py

+                if isinstance(chat_templates, (list, tuple)):
+                    # Un-flatten the list storage
+                    chat_templates = {template["name"]: template["template"] for template in chat_templates}


this could be avoided if we decide not to support this case (i.e. if it's not already supported)

Yep, removed!

Wauplin · 2025-03-25T17:06:51Z

src/transformers/tokenization_utils_base.py

+                    if is_local:
+                        template_dir = Path(pretrained_model_name_or_path, CHAT_TEMPLATE_DIR)
+                        if template_dir.is_dir():
+                            for template_file in template_dir.glob("*.jinja"):
+                                template_name = template_file.name.removesuffix(".jinja")
+                                vocab_files[f"chat_template_{template_name}"] = (
+                                    f"{CHAT_TEMPLATE_DIR}/{template_file.name}"
+                                )
+                    else:
+                        for template in list_repo_templates(
+                            pretrained_model_name_or_path,
+                            local_files_only=local_files_only,
+                            revision=revision,
+                            cache_dir=cache_dir,
+                        ):
+                            vocab_files[f"chat_template_{template}"] = f"{CHAT_TEMPLATE_DIR}/{template}.jinja"
+


I feel like this could be an helper method by itself (almost duplicated code compared to above logic in get_processor_dict

zucchini-nlp

+1 on all comments from @Wauplin regarding the processor templates. Also, we need to keep an eye on models that need to support templates for both: tokenizer and processor. Up to now, I haven't seen models using different templates for that, but who knows

zucchini-nlp · 2025-03-25T18:08:57Z

src/transformers/processing_utils.py

+            # Legacy format for multiple templates:
+            # chat template dicts are saved to chat_template.json as lists of dicts with fixed key names.


+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future

zucchini-nlp · 2025-03-25T18:14:38Z

src/transformers/processing_utils.py

+            if isinstance(self.chat_template, dict) and "default" in self.chat_template:
+                chat_template = self.chat_template["default"]
+            elif isinstance(self.chat_template, dict):
+                raise ValueError(
+                    'The processor has multiple chat templates but none of them are named "default". You need to specify'
+                    ' which one to use by passing the `chat_template` argument. Available templates are: '
+                    f'{", ".join(self.chat_template.keys())}'
+                )
+            elif self.chat_template is not None:


ah, i see support is added here as well! Makes sense then, I am just wondering if it is really needed

Yes, this was something I was thinking about a lot! However, what I noticed was that a lot of template authors started writing separate "normal" and "reasoning" templates, in addition to other template types like "rag" and "tool_use" that we also saw.

I felt like deleting support for multiple templates would cause problems for us as a result, so I decided to keep it, and add support in processors as well as tokenizers.

zucchini-nlp · 2025-03-25T18:28:02Z

tests/test_processing_common.py

+            # When we save as single files, tokenizers and processors share a chat template, which means
+            # the reloaded tokenizer should get the chat template as well
+            self.assertEqual(reloaded_processor.chat_template, reloaded_processor.tokenizer.chat_template)


good point about sharing. We started suppoting models to be loaded either as multimodal or text-only from the same repo, for example Gemma3

AFAIK in gemma3 we have same templates in tokenizer and in processor, but I'd need to make sure. Also I think this will become a recurring pattern. For now seems both templates are same, makes sense since it is the same LM under the hood. Only diff is that template should support content: {str} and content: {list of dicts} format

Should not be problem for now, but good to keep an eye on new models

Yeah, for sure - templates might have to be written to support either content schema in those cases, so they work for both tokenizers and processors. We should definitely watch closely as new models are added.

Wauplin · 2025-03-27T15:30:52Z

src/transformers/processing_utils.py

-                with open(output_chat_template_file, "w", encoding="utf-8") as writer:
-                    writer.write(chat_template_json_string)
-                logger.info(f"chat template saved in {output_chat_template_file}")
+        save_as_jinja = kwargs.get("save_raw_chat_template", False)


Suggested change

save_as_jinja = kwargs.get("save_raw_chat_template", False)

save_as_jinja = kwargs.get("save_chat_template_as_jinja", False)

related to the output_chat_template_file_jinja/output_chat_template_file_legacy naming above I think the name of the kwarg could be more explicit (not 100% happy with my suggestion though...)

I used save_jinja_files just because it's shorter, and I think it will make sense to people. save_raw_chat_template was definitely confusing!

much better!

Wauplin

LGTM from a huggingface_hub's usage perspective!

(still better to wait for official approval from transformers's team 😄)

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Rocketknight1 · 2025-04-11T15:37:11Z

Tests are green so merging, including making this the default new save format! I'll keep an eye out for bugs/issues, we have time before the next release to smooth things over.

Wauplin · 2025-04-15T11:19:45Z

Congrats for getting it to the finish line @Rocketknight1 !

…ingface#36588) * Add saving in the new format (but no loading yet!) * Add saving in the new format (but no loading yet!) * A new approach to template files! * make fixup * make fixup, set correct dir * Some progress but need to rework for cached_file * Rework loading handling again * Small fixes * Looks like it's working now! * make fixup * Working! * make fixup * make fixup * Add TODO so I don't miss it * Cleaner control flow with one less indent * Copy the new logic to processing_utils as well * Proper support for dicts of templates * make fixup * define the file/dir names in a single place * Update the processor chat template reload test as well * Add processor loading of multiple templates * Flatten correctly to match tokenizers * Better support when files are empty sometimes * Stop creating those empty templates * Revert changes now we don't have empty templates * Revert changes now we don't have empty templates * Don't support separate template files on the legacy path * Rework/simplify loading code * Make sure it's always a chat_template key in chat_template.json * Update processor handling of multiple templates * Add a full save-loading test to the tokenizer tests as well * Correct un-flattening * New test was incorrect * Correct error/offline handling * Better exception handling * More error handling cleanup * Add skips for test failing on main * Reorder to fix errors * make fixup * clarify legacy processor file docs and location * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/processing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Rename to _jinja and _legacy * Stop saving multiple templates in the legacy format * Cleanup the processing code * Cleanup the processing code more * make fixup * make fixup * correct reformatting * Use correct dir name * Fix import location * Use save_jinja_files instead of save_raw_chat_template_files * Correct the test for saving multiple processor templates * Fix type hint * Update src/transformers/utils/hub.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Patch llava_onevision test * Update src/transformers/processing_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Refactor chat template saving out into a separate function * Update tests for the new default * Don't do chat template saving logic when chat template isn't there * Ensure save_jinja_files is propagated to tokenizer correctly * Trigger tests * Update more tests to new default * Trigger tests --------- Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co>

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 199d69f to 1df676d Compare March 6, 2025 15:18

Rocketknight1 marked this pull request as ready for review March 7, 2025 12:19

github-actions bot requested a review from ArthurZucker March 7, 2025 12:19

Rocketknight1 removed the request for review from ArthurZucker March 7, 2025 12:27

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 2 times, most recently from 344e49e to 0473ebb Compare March 10, 2025 17:48

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 3 times, most recently from af3f6e4 to 6a6528b Compare March 13, 2025 17:28

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 3 times, most recently from f05c52a to d4bba4a Compare March 24, 2025 19:52

Rocketknight1 requested review from Wauplin and zucchini-nlp March 24, 2025 20:02

Wauplin reviewed Mar 25, 2025

View reviewed changes

zucchini-nlp reviewed Mar 26, 2025

View reviewed changes

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 862010b to 7f144a0 Compare March 26, 2025 18:44

Wauplin reviewed Mar 27, 2025

View reviewed changes

Wauplin approved these changes Mar 27, 2025

View reviewed changes

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 6 times, most recently from 486d57d to 31f545b Compare April 2, 2025 14:19

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 31f545b to 6e152ab Compare April 4, 2025 13:23

Rocketknight1 and others added 17 commits April 11, 2025 15:34

Cleanup the processing code

a2207f6

Cleanup the processing code more

8d5c2ec

make fixup

ce7a6a8

make fixup

e92e699

correct reformatting

51070fc

Use correct dir name

8ac0dda

Fix import location

ceadb8e

Use save_jinja_files instead of save_raw_chat_template_files

cf71c81

Correct the test for saving multiple processor templates

d8d8a6b

Fix type hint

dcd1a72

Update src/transformers/utils/hub.py

4461f8c

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Patch llava_onevision test

48c7499

Update src/transformers/processing_utils.py

bd618d8

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Update src/transformers/tokenization_utils_base.py

a23b317

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Refactor chat template saving out into a separate function

d7dc1ee

Update tests for the new default

b3d1ec8

Don't do chat template saving logic when chat template isn't there

23c50c5

Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 63978d7 to 23c50c5 Compare April 11, 2025 14:34

Rocketknight1 added 4 commits April 11, 2025 15:47

Ensure save_jinja_files is propagated to tokenizer correctly

09d75bd

Trigger tests

c69e4ff

Update more tests to new default

78d6cfa

Trigger tests

0d8e39c

Rocketknight1 merged commit bf46e44 into main Apr 11, 2025
21 checks passed

Rocketknight1 deleted the allow-save-loading-multiple-template-files branch April 11, 2025 15:37

anakin87 mentioned this pull request Jun 25, 2025

AutoTokenizer.from_pretrained does not propagate token #39030

Closed

aghilann mentioned this pull request Aug 21, 2025

fix(truss): make the passing in the chat template more robust BT-15187 basetenlabs/truss#1881

Merged

		# Legacy format for multiple templates:
		# chat template dicts are saved to chat_template.json as lists of dicts with fixed key names.

	save_as_jinja = kwargs.get("save_raw_chat_template", False)
	save_as_jinja = kwargs.get("save_chat_template_as_jinja", False)

🚨 🚨 Allow saving and loading multiple "raw" chat template files #36588

🚨 🚨 Allow saving and loading multiple "raw" chat template files #36588

Uh oh!

Conversation

Rocketknight1 commented Mar 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 6, 2025

Uh oh!

Rocketknight1 commented Mar 12, 2025

Uh oh!

Rocketknight1 commented Mar 24, 2025

Uh oh!

Rocketknight1 commented Mar 24, 2025

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Apr 11, 2025

Uh oh!

Uh oh!

Wauplin commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants