-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference plugin: Add Gemini model adapter #191292
Inference plugin: Add Gemini model adapter #191292
Conversation
/ci |
/ci |
/ci |
x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts
Outdated
Show resolved
Hide resolved
/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review
Note that there is currently no integration / FTR tests anywhere, because we did not yet figure out how exactly we want and could add e2e tests for those features.
interface ToolSchemaAnyOf extends ToolSchemaFragmentBase { | ||
anyOf: ToolSchemaType[]; | ||
} | ||
|
||
interface ToolSchemaAllOf extends ToolSchemaFragmentBase { | ||
allOf: ToolSchemaType[]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @dgieselaar: Gemini doesn't support type composition for tool definition, so we can't easily have it on the inference API. But given that the feature doesn't seem that useful, it feels like a very acceptable removal.
// systemInstruction is not supported on all gemini versions | ||
// so for now we just always use the old trick of user message + assistant acknowledge. | ||
// See https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#request | ||
const systemMessages: GeminiMessage[] | undefined = system | ||
? [ | ||
{ role: 'user', content: system }, | ||
{ role: 'assistant', content: 'Understood.' }, | ||
] | ||
: undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemInstruction
is available for gemini-1.5-flash, gemini-1.5-pro, and gemini-1.0-pro-002. This is slightly less support than for function invocation (as gemini.1.0-pro-001 isn't in the list), so I kept the hacky user-system-message approach for now, but we might use systemInstruction
very soon, as the version not supporting it are discontinued in less than 6 months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not be against dropping this entirely. I think we've mostly been talking about gemini 1.5 (e.g. in https://www.elastic.co/blog/whats-new-elastic-observability-8-15-0 and https://www.elastic.co/blog/whats-new-elastic-security-8-15-0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we agreed, done in d8822a4
subActionParams: { | ||
messages: messagesToGemini({ system, messages }), | ||
tools: toolsToGemini(tools), | ||
toolConfig: toolChoiceToConfig(toolChoice), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function calling is support on gemini-1.0-pro, gemini-1.5-flash-latest and gemini-1.5-pro-latest, so basically all version but the -vision
one which is AFAIK very specific, and discontinued in ~ 6 months anyway, so I felt like it was better to implement function calling the right way now rather than re-using the fake function invocation hack that o11y was using.
/ci |
0ad2720
to
e3486bc
Compare
? [ | ||
{ | ||
index: 0, | ||
toolCallId: '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this means there is never a tool call id, and Gemini doesn't support parallel function calling at all? If that is the case, can we use generateFakeToolCallId
from #190433? (Would be better than an empty string I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in b5145d5
x-pack/plugins/inference/server/chat_complete/adapters/gemini/process_vertex_stream.ts
Outdated
Show resolved
Hide resolved
getTestScheduler().run(({ expectObservable, hot }) => { | ||
const chunk: GenerateContentResponseChunk = { | ||
candidates: [{ index: 0, content: { role: 'model', parts: [{ text: 'some chunk' }] } }], | ||
}; | ||
|
||
const source$ = hot<GenerateContentResponseChunk>('--a', { a: chunk }); | ||
|
||
const processed$ = source$.pipe(processVertexStream()); | ||
|
||
expectObservable(processed$).toBe('--a', { | ||
a: { | ||
content: 'some chunk', | ||
tool_calls: [], | ||
type: ChatCompletionEventType.ChatCompletionChunk, | ||
}, | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably my lack of experience with these test helpers, but these tests are hard to grok for me (in terms of the test description vs the implementation of the test, ie, what does it actually test). Specifically the --a
stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go: https://rxjs.dev/guide/testing/marble-testing
--a
means that the event a
should be emitted on the third tick of the marble observable. In this specific test it's not that useful (I could have used a
as a diagram instead) given it's just a map-ish operator we're testing, but I usually still do it "in case of"
}; | ||
} | ||
|
||
function messagesToGemini({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, Gemini requires multi-turn messages (user, assistant, user, assistant, ad infinitum). OpenAI doesn't have this guarantee (I think). Should we guarantee multi-turn here, not by throwing, but injecting a user or assistant message? Or maybe we just throw top-level if not multi-turn so we can throw consistently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of that, but now it explains why some of my test workflows from yesterday were failing, as I was chaining user messages.
Yeah, we should probably do something about it. For the API's consistency's sake, I think I would rather inject messages to guarantee consistent rather than throwing... But we would also like to warn the developer that was they are doing might not be optimal for the model their are using... so ideally logging a warning or something on top, maybe?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum actually, what is done in the connector is that messages from the same role are merged
kibana/x-pack/plugins/stack_connectors/server/connector_types/gemini/gemini.ts
Lines 364 to 383 in 0be8295
const correctRole = row.role === 'assistant' ? 'model' : 'user'; | |
// if data is already preformatted by ActionsClientGeminiChatModel | |
if (row.parts) { | |
payload.contents.push(row); | |
} else { | |
if (correctRole === 'user' && previousRole === 'user') { | |
/** Append to the previous 'user' content | |
* This is to ensure that multiturn requests alternate between user and model | |
*/ | |
payload.contents[payload.contents.length - 1].parts[0].text += ` ${row.content}`; | |
} else { | |
// Add a new entry | |
payload.contents.push({ | |
role: correctRole, | |
parts: [ | |
{ | |
text: row.content, | |
}, | |
], | |
}); |
(but only for the user
role and if not using the parts
"raw" format).
I think I can do something similar in the adapter, by regrouping consecutive same-actor parts under a single message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fe00ac6, and tested it, multiple text
parts, or mixed text
+ functionResponse
parts are correctly interpreted by the model
4a91e94
to
e3486bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Pierre!!
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Summary
Add the
gemini
model adapter for theinference
plugin. Had to perform minor changes on the associated connectorAlso update the codeowner files to add the
@elastic/appex-ai-infra
team as (one of the) owner of the genAI connectors