Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine system message settings #14877

Merged
merged 7 commits into from
Feb 19, 2025
Merged

Refine system message settings #14877

merged 7 commits into from
Feb 19, 2025

Conversation

JonasHelming
Copy link
Contributor

@JonasHelming JonasHelming commented Feb 9, 2025

fixed #14867

What it does

  • replaces the supportsDeveloperMessage with developerMessageSettings (enum: ['user', 'system', 'developer', 'mergeWithFollowingUserMessage', 'skip'])

Controls the first system message: user, system, and developer will be used as a role, mergeWithFollowingUserMessage will prefix the following user message with the system message or convert the system message to user message if the next message is not a user message. skip will just remove the system message), defaulting to developer.

  • Extract message processing into a rebindable OpenAiModelUtils so that future special cases can be handled by adopters
  • Add test case for message handling

How to test

Test official and custom OpenAI models. Specifically check official: ['o1-preview', 'o1-mini'] that the system message is converted to a user message. All other official openAI models should use 'developer'

To test custom models, you can use:

{
"model": "gpt-4o",
"url": "https://api.openai.com/v1",
"id": "openaicustom",
"apiKey": true,
"enableStreaming": true,
"developerMessageSettings": "valueToTest"
},

DeepSeek reasoner now works with these settings:
{
"model": "deepseek-reasoner",
"url": "https://api.deepseek.com",
"id": "deepseek-resoner",
"apiKey": "YourKey",
"enableStreaming": true,
"developerMessageSettings": "mergeWithFollowingUserMessage"
},

To test, set a break point to where the messages are converted (the requests are sent)

Follow-ups

Breaking changes

  • This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Review checklist

Reminder for reviewers

@JonasHelming JonasHelming requested a review from planger February 9, 2025 22:51
Copy link
Contributor

@planger planger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks very good!

Just one comment inline that might be worth discussing, as we currently have pretty strong assumptions on the order of messages we get. Do you think we should be more flexible to the order and improve the algorithm to support an arbitrary order of system, user and assistant messages? Theoretically it is not specified that the system message is the first one.

* @param model the OpenAI model identifier. Currently not used, but allows subclasses to implement model-specific behavior.
* @returns an array of messages formatted for the OpenAI API.
*/
public processMessages(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public processMessages(
processMessages(

messages: LanguageModelRequestMessage[],
developerMessageSettings: DeveloperMessageSettings
): LanguageModelRequestMessage[] {
if (messages.length > 0 && messages[0].actor === 'system') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should really limit this behavior for system messages that are on the first position. Shouldn't we just iterate through the list and apply this logic to all system messages wherever they are?

We could then maybe rename mergeWithFirstUserMessage with mergeWithNextUserMessage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The interesting cases are then:

  • system:1
  • system:2
  • assistant:1

I would merge this to:

  • user:1\n2
  • assistant:1

and

  • system:1
  • system:2
  • user:A

i would merge this to

  • user: 1\n2\nA

Reasoning would be that there are APIs which forbid two following user messages.
We can easily achieve this behavior by running from last to first while merging.

@planger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good and makes sense! Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the value to "mergeWithFollowingUserMessage". It only merges, if the next message is a user message. Otherwise, I fear we risk very strange changes in the order of message. You can of course not produce message orders in which this might break specific models, e.g.

  • user
  • system
  • assistant
  • user

will become

  • user
  • user
  • assistant
  • user

But if you have these super special cases, you can now still override the processMessages.

} else if (developerMessageSettings === 'mergeWithFirstUserMessage') {
const systemMsg = messages[0];
const updatedMessages = messages.slice();
const userIndex = updatedMessages.findIndex((m, index) => index > 0 && m.actor === 'user');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit dangerous in a multi-turn conversation: where there are messages with the following order: user, system, assistant, user, assistant. We may skip over to the next user request essentially.

I think it'd be safer to find the next user and the next assistant message and if the next assistant message is closer than the next user message (or we receive -1 for user), we insert a user message as we do below.

This way, we prevent merging the system message into the next user/assistant response pair.

Copy link
Contributor

@planger planger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good to me.

@fipro78 Could you please check if the use case you've enabled in #14722 is still functional after this PR? As you'll see in this PR, we had to make the system message handling even more flexible now, as some models (DeepSeek, etc.) have diverging capabilities/requirements when it comes to the system message. Thank you!

@fipro78
Copy link
Contributor

fipro78 commented Feb 13, 2025

Thank you! Looks good to me.

@fipro78 Could you please check if the use case you've enabled in #14722 is still functional after this PR? As you'll see in this PR, we had to make the system message handling even more flexible now, as some models (DeepSeek, etc.) have diverging capabilities/requirements when it comes to the system message. Thank you!

Sure. How can I test this?

@planger
Copy link
Contributor

planger commented Feb 14, 2025

@fipro78 Thank you! It'd be great if you could build this PR locally and test if you can still correctly configure and connect to an Azure OpenAI instance, in particular regarding the system message setting. We'd like to ensure that we don't break your use case with this PR, but don't have an Azure instance to test this, unfortunately.

Thanks a lot!

@fipro78
Copy link
Contributor

fipro78 commented Feb 19, 2025

@JonasHelming @planger
I checked out this PR, built locally and tested with our Azure OpenAI.
If I use "developerMessageSettings": "developer", I get the expected error, that the role developer is not supported by this model. Switching then to "developerMessageSettings": "system" I get an answer from the model.

So from my point of view, the PR does not break the configuration and connection to an Azure OpenAI instance.

@JonasHelming JonasHelming merged commit 0d40de4 into master Feb 19, 2025
10 of 11 checks passed
@github-actions github-actions bot added this to the 1.59.0 milestone Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Theia AI] Support DeepSeek reasoner (and future models with special "system" prompt requirements
3 participants