Remove ChatFormat, InstructTemplate, old message converters #1895

RdoubleA · 2024-10-24T14:11:15Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Closes #1839. Closes #1849.

Changelog

What are the changes made in this PR?

Delete ChatFormat, InstructTemplate, and all references
Delete torchtune/data/_converters.py. These are replaced by the transforms in _messages.py
Delete the old dataset tutorial. All topics have already been covered in Basics section of documentation, and the tutorial refers to outdated APIs. Only sample packing was still new, so I moved that to its own section under Basics.
Update generate recipe to assume single prompt string and no chat format or instruct template since the prompt template is now defined on the tokenizer.

Test plan

Ran tune run generate --config generation. The output on main is just not good:

INFO:torchtune.utils._logging:Tell me a joke?
One of the most important things to do when doing stand up comedy is to tell a joke. The joke doesn’t have to be the best one ever told, but it must be memorable. The joke should be funny and it should be something that people can relate to.
The most important thing to remember is that you are not alone when you tell a joke. You are not the only person who has ever told a joke. You are not the only person who has ever told a joke. You are not the only person who has ever told a joke. You are not the only person who has ever told a joke.
The joke is not the only thing that matters. It is the story that makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny.
The joke is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what makes it funny. The story is what

After changes, it is somewhat more logical, although it still doesn't know how to stop:

INFO:torchtune.utils._logging:Tell me a joke.
One guy walks into a bar and orders a beer.
The bartender says, "Dang, you look like you need another one."
The guy says, "You know what? You're right."
Another guy walks into the bar and orders a beer.
The bartender says, "You look like you need another one."
The guy says, "You know what? You're right."
So a third guy walks into the bar and orders a beer.
The bartender says, "You look like you need another one."
The guy says, "You know what? You're right."
Then a fourth guy walks in and orders a beer.
The bartender says, "You look like you need another one."
The guy says, "You know what? You're right."
So the bartender says, "I'm not sure why, but you look like you need another one."
The guy says, "You know what? You're right."
So a fifth guy walks in and orders a beer.
The bartender says, "You look like you need another one."
The guy says, "You know what? You're right."
So the bartender pours him a beer and says, "Look,

pytorch-bot · 2024-10-24T14:11:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1895

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 81b6abe with merge base 2c948c6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SalmanMohammadi · 2024-10-24T22:10:46Z

docs/source/basics/packing.rst

@@ -0,0 +1,52 @@
+.. _packing_usage_label:


you snuck this in here you sneaky lil man

i love it

ok I just now realized the joke

SalmanMohammadi · 2024-10-24T22:17:00Z

recipes/generate.py

-                chat_format = _get_component_from_path(chat_format)
-                messages = chat_format.format(messages)
-            return self._tokenizer.tokenize_messages(messages)[0]
+        messages = [


noobq: Is this identical to free-form generation when a prompt template isn't provided?

No, this assumes instruct-based finetuned models I believe.

joecummings

No big complaints - thx!

joecummings · 2024-10-28T15:06:07Z

docs/source/api_ref_data.rst

-
-    get_sharegpt_messages
-    get_openai_messages
-
 .. _message_transforms_ref:

 Message transforms


nit: Can we call these ToMessage transforms to convey that they convey immediately that they convert data to message format?

I've used "Message transforms" throughout the docs, so I'll leave updating all those references for a future PR and keep this as is

joecummings · 2024-10-28T15:06:23Z

docs/source/basics/packing.rst

@@ -0,0 +1,52 @@
+.. _packing_usage_label:


docs/source/basics/packing.rst

joecummings · 2024-10-28T15:10:52Z

recipes/configs/generation.yaml

@@ -27,11 +27,10 @@ tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  path: /tmp/Llama-2-7b-hf/tokenizer.model
  max_seq_len: null
+  prompt_template: null

 # Generation arguments; defaults taken from gpt-fast
 prompt: "Tell me a joke?"


I'm wondering if we should just adopt a format like generation_v2 here where we have the default be to make it clear:

prompt: user: Tell me a joke.

yeah that works. was just trying to make the minimal changes needed since we are migrating to generate_c2

cut cut cut

6ad8ec1

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 24, 2024

RdoubleA added 2 commits October 24, 2024 07:13

expose prompt template

f2326f7

fix docs

2a69d76

SalmanMohammadi reviewed Oct 24, 2024

View reviewed changes

joecummings approved these changes Oct 28, 2024

View reviewed changes

RdoubleA added 2 commits October 28, 2024 08:44

Merge branch 'main' into deprecate_converters

d25f657

comments

81b6abe

RdoubleA merged commit d3039da into pytorch:main Oct 28, 2024
17 checks passed

RdoubleA deleted the deprecate_converters branch October 28, 2024 18:30

This was referenced Oct 28, 2024

Add configurable system prompt in generate #1393

Closed

torchtune's InstructTemplate is deprecated, affecting llm_pte_finetuning examples pytorch/executorch#6552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove ChatFormat, InstructTemplate, old message converters #1895

Remove ChatFormat, InstructTemplate, old message converters #1895

RdoubleA commented Oct 24, 2024 •

edited

Loading

pytorch-bot bot commented Oct 24, 2024 •

edited

Loading

SalmanMohammadi Oct 24, 2024

joecummings Oct 28, 2024

RdoubleA Oct 28, 2024

SalmanMohammadi Oct 24, 2024

joecummings Oct 28, 2024

joecummings left a comment

joecummings Oct 28, 2024

RdoubleA Oct 28, 2024

joecummings Oct 28, 2024

joecummings Oct 28, 2024

RdoubleA Oct 28, 2024

Remove ChatFormat, InstructTemplate, old message converters #1895

Remove ChatFormat, InstructTemplate, old message converters #1895

Conversation

RdoubleA commented Oct 24, 2024 • edited Loading

Context

Changelog

Test plan

pytorch-bot bot commented Oct 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1895

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joecummings left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RdoubleA commented Oct 24, 2024 •

edited

Loading

pytorch-bot bot commented Oct 24, 2024 •

edited

Loading