Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference plugin: Add Gemini model adapter #191292

Merged
merged 19 commits into from
Aug 28, 2024

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Aug 26, 2024

Summary

Add the gemini model adapter for the inference plugin. Had to perform minor changes on the associated connector

Also update the codeowner files to add the @elastic/appex-ai-infra team as (one of the) owner of the genAI connectors

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet pgayvallet added release_note:skip Skip the PR/issue when compiling release notes v8.16.0 Team:AI Infra AppEx AI Infrastructure Team labels Aug 26, 2024
@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet
Copy link
Contributor Author

/ci

Copy link
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review

Note that there is currently no integration / FTR tests anywhere, because we did not yet figure out how exactly we want and could add e2e tests for those features.

Comment on lines -38 to -44
interface ToolSchemaAnyOf extends ToolSchemaFragmentBase {
anyOf: ToolSchemaType[];
}

interface ToolSchemaAllOf extends ToolSchemaFragmentBase {
allOf: ToolSchemaType[];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @dgieselaar: Gemini doesn't support type composition for tool definition, so we can't easily have it on the inference API. But given that the feature doesn't seem that useful, it feels like a very acceptable removal.

Comment on lines 174 to 182
// systemInstruction is not supported on all gemini versions
// so for now we just always use the old trick of user message + assistant acknowledge.
// See https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#request
const systemMessages: GeminiMessage[] | undefined = system
? [
{ role: 'user', content: system },
{ role: 'assistant', content: 'Understood.' },
]
: undefined;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemInstruction is available for gemini-1.5-flash, gemini-1.5-pro, and gemini-1.0-pro-002. This is slightly less support than for function invocation (as gemini.1.0-pro-001 isn't in the list), so I kept the hacky user-system-message approach for now, but we might use systemInstruction very soon, as the version not supporting it are discontinued in less than 6 months.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be against dropping this entirely. I think we've mostly been talking about gemini 1.5 (e.g. in https://www.elastic.co/blog/whats-new-elastic-observability-8-15-0 and https://www.elastic.co/blog/whats-new-elastic-security-8-15-0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we agreed, done in d8822a4

subActionParams: {
messages: messagesToGemini({ system, messages }),
tools: toolsToGemini(tools),
toolConfig: toolChoiceToConfig(toolChoice),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function calling is support on gemini-1.0-pro, gemini-1.5-flash-latest and gemini-1.5-pro-latest, so basically all version but the -vision one which is AFAIK very specific, and discontinued in ~ 6 months anyway, so I felt like it was better to implement function calling the right way now rather than re-using the fake function invocation hack that o11y was using.

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet pgayvallet marked this pull request as ready for review August 26, 2024 21:19
@pgayvallet pgayvallet requested review from a team as code owners August 26, 2024 21:19
@pgayvallet pgayvallet requested a review from a team as a code owner August 27, 2024 06:15
? [
{
index: 0,
toolCallId: '',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means there is never a tool call id, and Gemini doesn't support parallel function calling at all? If that is the case, can we use generateFakeToolCallId from #190433? (Would be better than an empty string I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is never a tool call id

Correct

Gemini doesn't support parallel function calling at all

That's my understanding, and this seems to confirm it.

can we use generateFakeToolCallId from #190433? (Would be better than an empty string I think)

Yeah, I guess that would make sense, will adapt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in b5145d5

Comment on lines +30 to +46
getTestScheduler().run(({ expectObservable, hot }) => {
const chunk: GenerateContentResponseChunk = {
candidates: [{ index: 0, content: { role: 'model', parts: [{ text: 'some chunk' }] } }],
};

const source$ = hot<GenerateContentResponseChunk>('--a', { a: chunk });

const processed$ = source$.pipe(processVertexStream());

expectObservable(processed$).toBe('--a', {
a: {
content: 'some chunk',
tool_calls: [],
type: ChatCompletionEventType.ChatCompletionChunk,
},
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably my lack of experience with these test helpers, but these tests are hard to grok for me (in terms of the test description vs the implementation of the test, ie, what does it actually test). Specifically the --a stuff

Copy link
Contributor Author

@pgayvallet pgayvallet Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go: https://rxjs.dev/guide/testing/marble-testing

--a means that the event a should be emitted on the third tick of the marble observable. In this specific test it's not that useful (I could have used a as a diagram instead) given it's just a map-ish operator we're testing, but I usually still do it "in case of"

};
}

function messagesToGemini({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Gemini requires multi-turn messages (user, assistant, user, assistant, ad infinitum). OpenAI doesn't have this guarantee (I think). Should we guarantee multi-turn here, not by throwing, but injecting a user or assistant message? Or maybe we just throw top-level if not multi-turn so we can throw consistently?

Copy link
Contributor Author

@pgayvallet pgayvallet Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of that, but now it explains why some of my test workflows from yesterday were failing, as I was chaining user messages.

Yeah, we should probably do something about it. For the API's consistency's sake, I think I would rather inject messages to guarantee consistent rather than throwing... But we would also like to warn the developer that was they are doing might not be optimal for the model their are using... so ideally logging a warning or something on top, maybe?

WDYT?

Copy link
Contributor Author

@pgayvallet pgayvallet Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum actually, what is done in the connector is that messages from the same role are merged

const correctRole = row.role === 'assistant' ? 'model' : 'user';
// if data is already preformatted by ActionsClientGeminiChatModel
if (row.parts) {
payload.contents.push(row);
} else {
if (correctRole === 'user' && previousRole === 'user') {
/** Append to the previous 'user' content
* This is to ensure that multiturn requests alternate between user and model
*/
payload.contents[payload.contents.length - 1].parts[0].text += ` ${row.content}`;
} else {
// Add a new entry
payload.contents.push({
role: correctRole,
parts: [
{
text: row.content,
},
],
});

(but only for the user role and if not using the parts "raw" format).

I think I can do something similar in the adapter, by regrouping consecutive same-actor parts under a single message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fe00ac6, and tested it, multiple text parts, or mixed text + functionResponse parts are correctly interpreted by the model

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Pierre!!

@pgayvallet pgayvallet enabled auto-merge (squash) August 28, 2024 11:39
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit 02a3992 into elastic:main Aug 28, 2024
20 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:AI Infra AppEx AI Infrastructure Team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants