Inference plugin: Add Gemini model adapter #191292

pgayvallet · 2024-08-26T15:50:30Z

Summary

Add the gemini model adapter for the inference plugin. Had to perform minor changes on the associated connector

Also update the codeowner files to add the @elastic/appex-ai-infra team as (one of the) owner of the genAI connectors

pgayvallet · 2024-08-26T15:50:37Z

/ci

pgayvallet · 2024-08-26T15:54:48Z

/ci

pgayvallet · 2024-08-26T16:21:21Z

/ci

x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts

…ector

pgayvallet · 2024-08-26T21:02:06Z

/ci

pgayvallet

Self-review

Note that there is currently no integration / FTR tests anywhere, because we did not yet figure out how exactly we want and could add e2e tests for those features.

pgayvallet · 2024-08-26T21:07:39Z

x-pack/plugins/inference/common/chat_complete/tool_schema.ts

-interface ToolSchemaAnyOf extends ToolSchemaFragmentBase {
-  anyOf: ToolSchemaType[];
-}
-
-interface ToolSchemaAllOf extends ToolSchemaFragmentBase {
-  allOf: ToolSchemaType[];
-}


As discussed with @dgieselaar: Gemini doesn't support type composition for tool definition, so we can't easily have it on the inference API. But given that the feature doesn't seem that useful, it feels like a very acceptable removal.

pgayvallet · 2024-08-26T21:12:41Z

x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts

+  // systemInstruction is not supported on all gemini versions
+  // so for now we just always use the old trick of user message + assistant acknowledge.
+  // See https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#request
+  const systemMessages: GeminiMessage[] | undefined = system
+    ? [
+        { role: 'user', content: system },
+        { role: 'assistant', content: 'Understood.' },
+      ]
+    : undefined;


systemInstruction is available for gemini-1.5-flash, gemini-1.5-pro, and gemini-1.0-pro-002. This is slightly less support than for function invocation (as gemini.1.0-pro-001 isn't in the list), so I kept the hacky user-system-message approach for now, but we might use systemInstruction very soon, as the version not supporting it are discontinued in less than 6 months.

I would not be against dropping this entirely. I think we've mostly been talking about gemini 1.5 (e.g. in https://www.elastic.co/blog/whats-new-elastic-observability-8-15-0 and https://www.elastic.co/blog/whats-new-elastic-security-8-15-0).

As we agreed, done in d8822a4

pgayvallet · 2024-08-26T21:15:32Z

x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts

+        subActionParams: {
+          messages: messagesToGemini({ system, messages }),
+          tools: toolsToGemini(tools),
+          toolConfig: toolChoiceToConfig(toolChoice),


Function calling is support on gemini-1.0-pro, gemini-1.5-flash-latest and gemini-1.5-pro-latest, so basically all version but the -vision one which is AFAIK very specific, and discontinued in ~ 6 months anyway, so I felt like it was better to implement function calling the right way now rather than re-using the fake function invocation hack that o11y was using.

pgayvallet · 2024-08-26T21:19:35Z

/ci

…ector

dgieselaar · 2024-08-27T06:49:01Z

x-pack/plugins/inference/server/chat_complete/adapters/gemini/process_vertex_stream.ts

+              ? [
+                  {
+                    index: 0,
+                    toolCallId: '',


I assume this means there is never a tool call id, and Gemini doesn't support parallel function calling at all? If that is the case, can we use generateFakeToolCallId from #190433? (Would be better than an empty string I think)

there is never a tool call id

Correct

Gemini doesn't support parallel function calling at all

That's my understanding, and this seems to confirm it.

can we use generateFakeToolCallId from #190433? (Would be better than an empty string I think)

Yeah, I guess that would make sense, will adapt

done in b5145d5

x-pack/plugins/inference/server/chat_complete/adapters/gemini/process_vertex_stream.ts

dgieselaar · 2024-08-27T06:51:32Z

x-pack/plugins/inference/server/chat_complete/adapters/gemini/process_vertex_stream.test.ts

+    getTestScheduler().run(({ expectObservable, hot }) => {
+      const chunk: GenerateContentResponseChunk = {
+        candidates: [{ index: 0, content: { role: 'model', parts: [{ text: 'some chunk' }] } }],
+      };
+
+      const source$ = hot<GenerateContentResponseChunk>('--a', { a: chunk });
+
+      const processed$ = source$.pipe(processVertexStream());
+
+      expectObservable(processed$).toBe('--a', {
+        a: {
+          content: 'some chunk',
+          tool_calls: [],
+          type: ChatCompletionEventType.ChatCompletionChunk,
+        },
+      });
+    });


It's probably my lack of experience with these test helpers, but these tests are hard to grok for me (in terms of the test description vs the implementation of the test, ie, what does it actually test). Specifically the --a stuff

Here you go: https://rxjs.dev/guide/testing/marble-testing

--a means that the event a should be emitted on the third tick of the marble observable. In this specific test it's not that useful (I could have used a as a diagram instead) given it's just a map-ish operator we're testing, but I usually still do it "in case of"

dgieselaar · 2024-08-27T07:00:46Z

x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts

+  };
+}
+
+function messagesToGemini({


IIRC, Gemini requires multi-turn messages (user, assistant, user, assistant, ad infinitum). OpenAI doesn't have this guarantee (I think). Should we guarantee multi-turn here, not by throwing, but injecting a user or assistant message? Or maybe we just throw top-level if not multi-turn so we can throw consistently?

I wasn't aware of that, but now it explains why some of my test workflows from yesterday were failing, as I was chaining user messages.

Yeah, we should probably do something about it. For the API's consistency's sake, I think I would rather inject messages to guarantee consistent rather than throwing... But we would also like to warn the developer that was they are doing might not be optimal for the model their are using... so ideally logging a warning or something on top, maybe?

WDYT?

Hum actually, what is done in the connector is that messages from the same role are merged

kibana/x-pack/plugins/stack_connectors/server/connector_types/gemini/gemini.ts

Lines 364 to 383 in 0be8295

const correctRole = row.role === 'assistant' ? 'model' : 'user';

// if data is already preformatted by ActionsClientGeminiChatModel

if (row.parts) {

payload.contents.push(row);

} else {

if (correctRole === 'user' && previousRole === 'user') {

/** Append to the previous 'user' content

* This is to ensure that multiturn requests alternate between user and model

*/

payload.contents[payload.contents.length - 1].parts[0].text += ` ${row.content}`;

} else {

// Add a new entry

payload.contents.push({

role: correctRole,

parts: [

{

text: row.content,

},

],

});

(but only for the user role and if not using the parts "raw" format).

I think I can do something similar in the adapter, by regrouping consecutive same-actor parts under a single message.

Done in fe00ac6, and tested it, multiple text parts, or mixed text + functionResponse parts are correctly interpreted by the model

…ector

…ocs'

dgieselaar

thanks Pierre!!

…ector

kibana-ci · 2024-08-28T13:02:07Z

💚 Build Succeeded

Buildkite Build
Commit: fc720c8

Metrics [docs]

✅ unchanged

History

💚 Build #230152 succeeded fe00ac6
💚 Build #230008 succeeded 4a91e942c072e319d8fed960881030c8a1fb4637
💔 Build #229944 failed 0ad2720e437ead8b2830ed4004fd3f8beb1c57d3
💚 Build #229856 succeeded 45402d3

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

pgayvallet added 2 commits August 26, 2024 17:45

Add inference adapter for Gemini

6b481c4

nits

3d8eef7

pgayvallet added release_note:skip Skip the PR/issue when compiling release notes v8.16.0 Team:AI Infra AppEx AI Infrastructure Team labels Aug 26, 2024

nits

88788ca

add tests for operator

45402d3

dgieselaar reviewed Aug 26, 2024

View reviewed changes

x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts Outdated Show resolved Hide resolved

pgayvallet added 3 commits August 26, 2024 22:56

add tests for gemini adapter, and add tool config to connector

5ecaaa3

Merge remote-tracking branch 'upstream/main' into kbn-xxx-gemini-conn…

2f2d240

…ector

remove useless default statement

7627ab7

pgayvallet commented Aug 26, 2024

View reviewed changes

pgayvallet added 2 commits August 26, 2024 23:19

use the parameter on non-stream call too

071d9a3

update codeowner file

b2ff857

pgayvallet marked this pull request as ready for review August 26, 2024 21:19

pgayvallet requested review from a team as code owners August 26, 2024 21:19

pgayvallet requested a review from dgieselaar August 26, 2024 21:30

pgayvallet added 2 commits August 27, 2024 08:15

update connector IT snapshot

4f2e664

Merge remote-tracking branch 'upstream/main' into kbn-xxx-gemini-conn…

e3486bc

…ector

pgayvallet force-pushed the kbn-xxx-gemini-connector branch from 0ad2720 to e3486bc Compare August 27, 2024 06:15

pgayvallet requested a review from a team as a code owner August 27, 2024 06:15

dgieselaar reviewed Aug 27, 2024

View reviewed changes

x-pack/plugins/inference/server/chat_complete/adapters/gemini/process_vertex_stream.ts Outdated Show resolved Hide resolved

dgieselaar reviewed Aug 27, 2024

View reviewed changes

pgayvallet force-pushed the kbn-xxx-gemini-connector branch from 4a91e94 to e3486bc Compare August 27, 2024 14:05

pgayvallet and others added 7 commits August 27, 2024 16:08

Merge remote-tracking branch 'upstream/main' into kbn-xxx-gemini-conn…

aa8a4e0

…ector

[CI] Auto-commit changed files from 'node scripts/build_plugin_list_d…

dd18ef9

…ocs'

use native systemInstruction

d8822a4

add unit test for system instructions

b71632b

generate random tool call id

b5145d5

update snapshot

2ecc4ad

ensure multi-turn messages for gemini

fe00ac6

jcger approved these changes Aug 27, 2024

View reviewed changes

dgieselaar approved these changes Aug 28, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into kbn-xxx-gemini-conn…

fc720c8

…ector

pgayvallet enabled auto-merge (squash) August 28, 2024 11:39

pgayvallet merged commit 02a3992 into elastic:main Aug 28, 2024
20 checks passed

kibanamachine added the backport:skip This commit does not require backporting label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference plugin: Add Gemini model adapter #191292

Inference plugin: Add Gemini model adapter #191292

pgayvallet commented Aug 26, 2024 •

edited

Loading

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet left a comment •

edited

Loading

pgayvallet Aug 26, 2024

pgayvallet Aug 26, 2024

dgieselaar Aug 27, 2024

pgayvallet Aug 27, 2024

pgayvallet Aug 26, 2024

pgayvallet commented Aug 26, 2024

dgieselaar Aug 27, 2024

pgayvallet Aug 27, 2024

pgayvallet Aug 27, 2024

dgieselaar Aug 27, 2024

pgayvallet Aug 27, 2024 •

edited

Loading

dgieselaar Aug 27, 2024

pgayvallet Aug 27, 2024 •

edited

Loading

pgayvallet Aug 27, 2024 •

edited

Loading

pgayvallet Aug 27, 2024

dgieselaar left a comment

kibana-ci commented Aug 28, 2024

	const correctRole = row.role === 'assistant' ? 'model' : 'user';
	// if data is already preformatted by ActionsClientGeminiChatModel
	if (row.parts) {
	payload.contents.push(row);
	} else {
	if (correctRole === 'user' && previousRole === 'user') {
	/** Append to the previous 'user' content
	* This is to ensure that multiturn requests alternate between user and model
	*/
	payload.contents[payload.contents.length - 1].parts[0].text += ` ${row.content}`;
	} else {
	// Add a new entry
	payload.contents.push({
	role: correctRole,
	parts: [
	{
	text: row.content,
	},
	],
	});

Inference plugin: Add Gemini model adapter #191292

Inference plugin: Add Gemini model adapter #191292

Conversation

pgayvallet commented Aug 26, 2024 • edited Loading

Summary

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet commented Aug 26, 2024

pgayvallet left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgayvallet commented Aug 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgayvallet Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgayvallet Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

pgayvallet Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgieselaar left a comment

Choose a reason for hiding this comment

kibana-ci commented Aug 28, 2024

💚 Build Succeeded

Metrics [docs]

History

pgayvallet commented Aug 26, 2024 •

edited

Loading

pgayvallet left a comment •

edited

Loading

pgayvallet Aug 27, 2024 •

edited

Loading

pgayvallet Aug 27, 2024 •

edited

Loading

pgayvallet Aug 27, 2024 •

edited

Loading