Skip to content

Conversation

@Rocketknight1
Copy link
Member

Extremely draft PR for now, but the idea here is that even when a model has multiple templates, we can save them in a templates/ directory, and finally no longer have any cases where templates have to be saved as single JSON-encoded lines.

@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 199d69f to 1df676d Compare March 6, 2025 15:18
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review March 7, 2025 12:19
@github-actions github-actions bot requested a review from ArthurZucker March 7, 2025 12:19
@Rocketknight1 Rocketknight1 removed the request for review from ArthurZucker March 7, 2025 12:27
@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 2 times, most recently from 344e49e to 0473ebb Compare March 10, 2025 17:48
@Rocketknight1
Copy link
Member Author

Failing tests are unrelated, so cc @ArthurZucker @Cyrilvallez for core maintainer review. I realize this is probably a lower priority than some other stuff right now, though, so don't stress!

@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 3 times, most recently from af3f6e4 to 6a6528b Compare March 13, 2025 17:28
@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 3 times, most recently from f05c52a to d4bba4a Compare March 24, 2025 19:52
@Rocketknight1
Copy link
Member Author

cc @zucchini-nlp this should be just about ready for review now - can you take a look at the processor sections before I ping the core maintainers?

@Rocketknight1
Copy link
Member Author

Also cc @Wauplin to check the Hub downloading code!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping! Logic looks good to me. I left some comments mostly about code style but feel free to ignore some if not relevant.

(Note! I did not check the tests)

Comment on lines 653 to 654
# Legacy format for multiple templates:
# chat template dicts are saved to chat_template.json as lists of dicts with fixed key names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this format already in use? (multiple template, single JSON file?).

Asking as I can't see it in the removed code of this PR. If it was not possible before I'd tend to think it shouldn't be added (better to raise an exception suggesting the user to pass save_raw_chat_template=True instead).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I blocked saving multiple templates in the legacy format (while still supporting it in the modern format)

Comment on lines 859 to 861
if isinstance(chat_templates, (list, tuple)):
# Un-flatten the list storage
chat_templates = {template["name"]: template["template"] for template in chat_templates}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be avoided if we decide not to support this case (i.e. if it's not already supported)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed!

Comment on lines +2018 to +2034
if is_local:
template_dir = Path(pretrained_model_name_or_path, CHAT_TEMPLATE_DIR)
if template_dir.is_dir():
for template_file in template_dir.glob("*.jinja"):
template_name = template_file.name.removesuffix(".jinja")
vocab_files[f"chat_template_{template_name}"] = (
f"{CHAT_TEMPLATE_DIR}/{template_file.name}"
)
else:
for template in list_repo_templates(
pretrained_model_name_or_path,
local_files_only=local_files_only,
revision=revision,
cache_dir=cache_dir,
):
vocab_files[f"chat_template_{template}"] = f"{CHAT_TEMPLATE_DIR}/{template}.jinja"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be an helper method by itself (almost duplicated code compared to above logic in get_processor_dict

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on all comments from @Wauplin regarding the processor templates. Also, we need to keep an eye on models that need to support templates for both: tokenizer and processor. Up to now, I haven't seen models using different templates for that, but who knows

Comment on lines 653 to 654
# Legacy format for multiple templates:
# chat template dicts are saved to chat_template.json as lists of dicts with fixed key names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, since we don't support multiple templates in apply_chat_template, I'd prefer to not allow saving it yet. We can add multiple templates if there is such need in the future

Comment on lines 1394 to 1409
if isinstance(self.chat_template, dict) and "default" in self.chat_template:
chat_template = self.chat_template["default"]
elif isinstance(self.chat_template, dict):
raise ValueError(
'The processor has multiple chat templates but none of them are named "default". You need to specify'
' which one to use by passing the `chat_template` argument. Available templates are: '
f'{", ".join(self.chat_template.keys())}'
)
elif self.chat_template is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i see support is added here as well! Makes sense then, I am just wondering if it is really needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was something I was thinking about a lot! However, what I noticed was that a lot of template authors started writing separate "normal" and "reasoning" templates, in addition to other template types like "rag" and "tool_use" that we also saw.

I felt like deleting support for multiple templates would cause problems for us as a result, so I decided to keep it, and add support in processors as well as tokenizers.

Comment on lines 764 to 800
# When we save as single files, tokenizers and processors share a chat template, which means
# the reloaded tokenizer should get the chat template as well
self.assertEqual(reloaded_processor.chat_template, reloaded_processor.tokenizer.chat_template)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point about sharing. We started suppoting models to be loaded either as multimodal or text-only from the same repo, for example Gemma3

AFAIK in gemma3 we have same templates in tokenizer and in processor, but I'd need to make sure. Also I think this will become a recurring pattern. For now seems both templates are same, makes sense since it is the same LM under the hood. Only diff is that template should support content: {str} and content: {list of dicts} format

Should not be problem for now, but good to keep an eye on new models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for sure - templates might have to be written to support either content schema in those cases, so they work for both tokenizers and processors. We should definitely watch closely as new models are added.

@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 862010b to 7f144a0 Compare March 26, 2025 18:44
with open(output_chat_template_file, "w", encoding="utf-8") as writer:
writer.write(chat_template_json_string)
logger.info(f"chat template saved in {output_chat_template_file}")
save_as_jinja = kwargs.get("save_raw_chat_template", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
save_as_jinja = kwargs.get("save_raw_chat_template", False)
save_as_jinja = kwargs.get("save_chat_template_as_jinja", False)

related to the output_chat_template_file_jinja/output_chat_template_file_legacy naming above I think the name of the kwarg could be more explicit (not 100% happy with my suggestion though...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used save_jinja_files just because it's shorter, and I think it will make sense to people. save_raw_chat_template was definitely confusing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a huggingface_hub's usage perspective!

(still better to wait for official approval from transformers's team 😄)

@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch 6 times, most recently from 486d57d to 31f545b Compare April 2, 2025 14:19
@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 31f545b to 6e152ab Compare April 4, 2025 13:23
@Rocketknight1 Rocketknight1 force-pushed the allow-save-loading-multiple-template-files branch from 63978d7 to 23c50c5 Compare April 11, 2025 14:34
@Rocketknight1
Copy link
Member Author

Tests are green so merging, including making this the default new save format! I'll keep an eye out for bugs/issues, we have time before the next release to smooth things over.

@Rocketknight1 Rocketknight1 merged commit bf46e44 into main Apr 11, 2025
21 checks passed
@Rocketknight1 Rocketknight1 deleted the allow-save-loading-multiple-template-files branch April 11, 2025 15:37
@Wauplin
Copy link
Contributor

Wauplin commented Apr 15, 2025

Congrats for getting it to the finish line @Rocketknight1 !

cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
…ingface#36588)

* Add saving in the new format (but no loading yet!)

* Add saving in the new format (but no loading yet!)

* A new approach to template files!

* make fixup

* make fixup, set correct dir

* Some progress but need to rework for cached_file

* Rework loading handling again

* Small fixes

* Looks like it's working now!

* make fixup

* Working!

* make fixup

* make fixup

* Add TODO so I don't miss it

* Cleaner control flow with one less indent

* Copy the new logic to processing_utils as well

* Proper support for dicts of templates

* make fixup

* define the file/dir names in a single place

* Update the processor chat template reload test as well

* Add processor loading of multiple templates

* Flatten correctly to match tokenizers

* Better support when files are empty sometimes

* Stop creating those empty templates

* Revert changes now we don't have empty templates

* Revert changes now we don't have empty templates

* Don't support separate template files on the legacy path

* Rework/simplify loading code

* Make sure it's always a chat_template key in chat_template.json

* Update processor handling of multiple templates

* Add a full save-loading test to the tokenizer tests as well

* Correct un-flattening

* New test was incorrect

* Correct error/offline handling

* Better exception handling

* More error handling cleanup

* Add skips for test failing on main

* Reorder to fix errors

* make fixup

* clarify legacy processor file docs and location

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Rename to _jinja and _legacy

* Stop saving multiple templates in the legacy format

* Cleanup the processing code

* Cleanup the processing code more

* make fixup

* make fixup

* correct reformatting

* Use correct dir name

* Fix import location

* Use save_jinja_files instead of save_raw_chat_template_files

* Correct the test for saving multiple processor templates

* Fix type hint

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Patch llava_onevision test

* Update src/transformers/processing_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Refactor chat template saving out into a separate function

* Update tests for the new default

* Don't do chat template saving logic when chat template isn't there

* Ensure save_jinja_files is propagated to tokenizer correctly

* Trigger tests

* Update more tests to new default

* Trigger tests

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
…ingface#36588)

* Add saving in the new format (but no loading yet!)

* Add saving in the new format (but no loading yet!)

* A new approach to template files!

* make fixup

* make fixup, set correct dir

* Some progress but need to rework for cached_file

* Rework loading handling again

* Small fixes

* Looks like it's working now!

* make fixup

* Working!

* make fixup

* make fixup

* Add TODO so I don't miss it

* Cleaner control flow with one less indent

* Copy the new logic to processing_utils as well

* Proper support for dicts of templates

* make fixup

* define the file/dir names in a single place

* Update the processor chat template reload test as well

* Add processor loading of multiple templates

* Flatten correctly to match tokenizers

* Better support when files are empty sometimes

* Stop creating those empty templates

* Revert changes now we don't have empty templates

* Revert changes now we don't have empty templates

* Don't support separate template files on the legacy path

* Rework/simplify loading code

* Make sure it's always a chat_template key in chat_template.json

* Update processor handling of multiple templates

* Add a full save-loading test to the tokenizer tests as well

* Correct un-flattening

* New test was incorrect

* Correct error/offline handling

* Better exception handling

* More error handling cleanup

* Add skips for test failing on main

* Reorder to fix errors

* make fixup

* clarify legacy processor file docs and location

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Rename to _jinja and _legacy

* Stop saving multiple templates in the legacy format

* Cleanup the processing code

* Cleanup the processing code more

* make fixup

* make fixup

* correct reformatting

* Use correct dir name

* Fix import location

* Use save_jinja_files instead of save_raw_chat_template_files

* Correct the test for saving multiple processor templates

* Fix type hint

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Patch llava_onevision test

* Update src/transformers/processing_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Refactor chat template saving out into a separate function

* Update tests for the new default

* Don't do chat template saving logic when chat template isn't there

* Ensure save_jinja_files is propagated to tokenizer correctly

* Trigger tests

* Update more tests to new default

* Trigger tests

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants