-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine system message settings #14877
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks very good!
Just one comment inline that might be worth discussing, as we currently have pretty strong assumptions on the order of messages we get. Do you think we should be more flexible to the order and improve the algorithm to support an arbitrary order of system, user and assistant messages? Theoretically it is not specified that the system message is the first one.
* @param model the OpenAI model identifier. Currently not used, but allows subclasses to implement model-specific behavior. | ||
* @returns an array of messages formatted for the OpenAI API. | ||
*/ | ||
public processMessages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public processMessages( | |
processMessages( |
messages: LanguageModelRequestMessage[], | ||
developerMessageSettings: DeveloperMessageSettings | ||
): LanguageModelRequestMessage[] { | ||
if (messages.length > 0 && messages[0].actor === 'system') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we should really limit this behavior for system messages that are on the first position. Shouldn't we just iterate through the list and apply this logic to all system messages wherever they are?
We could then maybe rename mergeWithFirstUserMessage
with mergeWithNextUserMessage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The interesting cases are then:
- system:1
- system:2
- assistant:1
I would merge this to:
- user:1\n2
- assistant:1
and
- system:1
- system:2
- user:A
i would merge this to
- user: 1\n2\nA
Reasoning would be that there are APIs which forbid two following user messages.
We can easily achieve this behavior by running from last to first while merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks good and makes sense! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change the value to "mergeWithFollowingUserMessage". It only merges, if the next message is a user message. Otherwise, I fear we risk very strange changes in the order of message. You can of course not produce message orders in which this might break specific models, e.g.
- user
- system
- assistant
- user
will become
- user
- user
- assistant
- user
But if you have these super special cases, you can now still override the processMessages.
} else if (developerMessageSettings === 'mergeWithFirstUserMessage') { | ||
const systemMsg = messages[0]; | ||
const updatedMessages = messages.slice(); | ||
const userIndex = updatedMessages.findIndex((m, index) => index > 0 && m.actor === 'user'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit dangerous in a multi-turn conversation: where there are messages with the following order: user
, system
, assistant
, user
, assistant
. We may skip over to the next user request essentially.
I think it'd be safer to find the next user and the next assistant message and if the next assistant message is closer than the next user message (or we receive -1 for user), we insert a user message as we do below.
This way, we prevent merging the system message into the next user/assistant response pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good to me.
@fipro78 Could you please check if the use case you've enabled in #14722 is still functional after this PR? As you'll see in this PR, we had to make the system message handling even more flexible now, as some models (DeepSeek, etc.) have diverging capabilities/requirements when it comes to the system message. Thank you!
Sure. How can I test this? |
@fipro78 Thank you! It'd be great if you could build this PR locally and test if you can still correctly configure and connect to an Azure OpenAI instance, in particular regarding the system message setting. We'd like to ensure that we don't break your use case with this PR, but don't have an Azure instance to test this, unfortunately. Thanks a lot! |
@JonasHelming @planger So from my point of view, the PR does not break the configuration and connection to an Azure OpenAI instance. |
370bd79
to
7ff280a
Compare
fixed #14867
What it does
Controls the first system message:
user
,system
, anddeveloper
will be used as a role,mergeWithFollowingUserMessage
will prefix the following user message with the system message or convert the system message to user message if the next message is not a user message.skip
will just remove the system message), defaulting todeveloper
.How to test
Test official and custom OpenAI models. Specifically check official: ['o1-preview', 'o1-mini'] that the system message is converted to a user message. All other official openAI models should use 'developer'
To test custom models, you can use:
{
"model": "gpt-4o",
"url": "https://api.openai.com/v1",
"id": "openaicustom",
"apiKey": true,
"enableStreaming": true,
"developerMessageSettings": "valueToTest"
},
DeepSeek reasoner now works with these settings:
{
"model": "deepseek-reasoner",
"url": "https://api.deepseek.com",
"id": "deepseek-resoner",
"apiKey": "YourKey",
"enableStreaming": true,
"developerMessageSettings": "mergeWithFollowingUserMessage"
},
To test, set a break point to where the messages are converted (the requests are sent)
Follow-ups
Breaking changes
Attribution
Review checklist
Reminder for reviewers