-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Convert BaseMessage
to alpaca format
#1184
Comments
BaseMessage
to BaseMessage
to alpaca format
It's maybe not appropriate to convert BaseMessages directly to Alpaca format, since BaseMessages are part of a conversation structure, with roles and other information that is expected to be irrelevant to Alpaca format (How would we convert an alpaca message to a BaseMessage?). Typing it this way probably adds unhelpful coupling. I think converting to/from strings is more appropriate. See master...alpaca_conversion_temp (cc @lightaime) |
Thanks @CaelumF , the scope of this issue is just to convert covert from string using regex has 2 limitation
|
Yeah when it comes to the generation of alpaca items, it makes way more sense to do things in JSON, especially when structured output and JSON proficiency are available in the inference model. (I assume JSON is the textual representation you had in mind). The linked class can be converted to/from json as its a pydantic class. But I also assume the plan/expectation is to have the alpaca entries just inside of the text portion of the messages as textual representations, rather than adding any specific fields to the BaseMessage? So the source information is always in one place in the form of text, and no type or contextual information is constraining the content of those messages (like we won't have an AlpacaBaseMessage or something) The other textual representation that starts with Since in this conversion all of the information will be coming from one field of BaseMessage (content) which is always a String, and sometimes it will be useful to come from strings from other sources, it feels more versatile and less confusing to make the conversion just to work in terms of strings. I can imagine some scenarios with multiple stages of data generation where it can be useful to go back from a textual representation to a validated object form too, in general I like what is communicated by the directions things can be converted. Or if we want to parse Alpaca items which were generated by a base model trained on that format, which it seems Alpaca was. (I'm not sure exactly why JSON wasn't just always used, maybe its because of newline handling or something) If we want to make the conversion easily discoverable, we can add a to_alpaca function inside of BaseMessage that is a single line calling the publicly available conversion function that takes a string using the message property, to make it clear that only the content is coming from the basemessage |
BaseMessage
to alpaca formatBaseMessage
to alpaca format
Required prerequisites
Motivation
alpaca format:
{"instruction": "...", "input": "...", "output": "..."}
Solution
No response
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: