-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Implement NL-to-ESQL task #190433
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
8610ec1
to
badb149
Compare
@elasticmachine merge upstream |
@pgayvallet is currently working on adding gemini/bedrock adapters, I'll put this back in draft until that work has completed. |
@@ -8,7 +8,7 @@ | |||
import { filter, OperatorFunction } from 'rxjs'; | |||
import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionTokenCountEvent } from '.'; | |||
|
|||
export function withoutTokenCountEvents<T extends ChatCompletionEvent>(): OperatorFunction< | |||
export function withoutTokenCountEvents<T extends ChatCompletionEvent<any>>(): OperatorFunction< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this shouldn't be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in current state.
Left a few NITS, comments, and questions for follow-ups (on my side)
import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionMessageEvent } from '.'; | ||
import type { ToolOptions } from './tools'; | ||
|
||
export function isChatCompletionMessageEvent<T extends ToolOptions<string>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: isChatCompletionChunkEvent
, isChatCompletionEvent
and isChatCompletionMessageEvent
could live in the same file (but i'll end up moving them myself anyway)
@@ -8,7 +8,7 @@ | |||
import { filter, OperatorFunction } from 'rxjs'; | |||
import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionTokenCountEvent } from '.'; | |||
|
|||
export function withoutTokenCountEvents<T extends ChatCompletionEvent>(): OperatorFunction< | |||
export function withoutTokenCountEvents<T extends ChatCompletionEvent<any>>(): OperatorFunction< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: would that work with unknown
instead of any
?
* 2.0. | ||
*/ | ||
|
||
export { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? So you mean you're aware that index
files are supposed to be used for re-exporting, not living alone in their folder containing concrete stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who'd have thought!
messages: ensureMultiTurn([ | ||
...(messages || []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really think we want that done at that level? We already have that done for Gemini
at the adapter's level:
kibana/x-pack/plugins/inference/server/chat_complete/adapters/gemini/gemini_adapter.ts
Lines 153 to 164 in 48e4dcf
function messagesToGemini({ messages }: { messages: Message[] }): GeminiMessage[] { | |
return messages.map(messageToGeminiMapper()).reduce<GeminiMessage[]>((output, message) => { | |
// merging consecutive messages from the same user, as Gemini requires multi-turn messages | |
const previousMessage = output.length ? output[output.length - 1] : undefined; | |
if (previousMessage?.role === message.role) { | |
previousMessage.parts.push(...message.parts); | |
} else { | |
output.push(message); | |
} | |
return output; | |
}, []); | |
} |
Which is cleaner, as it can work with the underlying format and append the things the right way instead of injecting -
messages in the middle (that's not much noise, but still).
If other LLM services have the same limitation / restriction, I would do the patching at the adapter's level for them to, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you're probably right, I was trying to avoid the overhead of doing it three times, doesn't openai need it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC no, openAI doesn't care about it.
Now regarding the issue about doing it multiple times: I went with the approach to do it at the adapter's level because I felt like doing it during the conversion to the underlying LLM's syntax was more powerful, but if we want to avoid doing it multiple time, we can do that check / transform in the chat API before calling the adapter.
|
||
import type { ToolOptions } from '../chat_complete/tools'; | ||
|
||
export function isOutputCompleteEvent<TId extends string, TToolOptions extends ToolOptions<string>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: same, isOutput****Event
don't need their dedicated file for a line functions.
evaluate: async (input, criteria) => { | ||
const evaluation = await lastValueFrom( | ||
outputApi('evaluate', { | ||
connectorId, | ||
system: `You are a helpful, respected assistant for evaluating task | ||
inputs and outputs in the Elastic Platform. | ||
|
||
Your goal is to verify whether the output of a task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, I'm afraid having a generic evaluation function for all task will be too naive. We already see with the NL-to-ESQL test suite that the evaluator gets bamboozled by the observed LLM because the evaluator has no knowledge of ESQL and no way to retrieve the documentation, so it don't spot invalid syntaxes or wrong parameters usages.
I'll be taking that one as a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, def room for improvement here
const buildTestDefinitions = (): Section[] => { | ||
const testDefinitions: Section[] = [ | ||
{ | ||
title: 'ES|QL query generation', | ||
tests: [ | ||
{ | ||
title: 'Generates a query to show the top 10 domains by doc count', | ||
question: `For standard Elastic ECS compliant packetbeat data view (\`packetbeat-*\`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have 13 tests here, with given the very wide area of things we should be testing, is a very small number.
I'm not asking to add more in this PR, this is very fine for merging given it's just a port of the existing code from o11y, but I was wondering what the best way to add more tests and to cover more scenarios would be. Like, what was the initial strategy when those tests were added / how were they chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests have been around for a while and they're probably too focused on the Observability domain, would leave to see us add more. As far as how they were chosen, it's a bit of a mix between "I want the Assistant to be able to do this" and "I see the Assistant makes this mistake".
|
||
const builtDocsDir = Path.join(__dirname, '../../../../../../../built-docs'); | ||
const builtDocsDir = Path.join(__dirname, '../../../../../../built-docs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: import { REPO_ROOT } from '@kbn/repo-info';
|
||
yargs(process.argv.slice(2)) | ||
.command( | ||
'*', | ||
'Extract ES|QL documentation for the Observability AI Assistant', | ||
'Extract ES|QL documentation', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections about me moving the ESQL doc generation script to a package in a follow-up? it's more consistent with how we're been working with scripts in AppEx, and I like the isolation of concerns (especially given we will likely be adding more extraction scripts later, e.g. for the elastic.co doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that's a good idea!
export type KibanaConfig = ReturnType<typeof readKibanaConfig>; | ||
|
||
export const readKibanaConfig = () => { | ||
const kibanaConfigDir = path.join(__filename, '../../../../../../config'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: REPO_ROOT
import { Message, MessageRole } from './chat_complete'; | ||
|
||
function isUserMessage(message: Message): boolean { | ||
return message.role !== MessageRole.Assistant; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MessageRole.Tool
considered a user message then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a reply from the user's system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Dario, LGTM!
In a draft PR, I've implemented the new NL to ESQL task in a LangChain tool and ran it successfully with the Security solution's default assistant graph 🥳 . I will move forward once you merge: #192042
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Public APIs missing exports
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in the UI and LGTM!
## Summary Follow-up of #190433 Fix [#192762](#192762) - Cleanup and refactor the documentation generation script - Make some tweak to the documentation to improve efficiency and make a better user of tokens - Perform human review of the generated content to make sure everything is accurate --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Follow-up of elastic#190433 Fix [elastic#192762](elastic#192762) - Cleanup and refactor the documentation generation script - Make some tweak to the documentation to improve efficiency and make a better user of tokens - Perform human review of the generated content to make sure everything is accurate --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 3226eb6)
# Backport This will backport the following commits from `main` to `8.x`: - [[inference] NL-to-ESQL: improve doc generation (#192378)](#192378) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Pierre Gayvallet","email":"pierre.gayvallet@elastic.co"},"sourceCommit":{"committedDate":"2024-09-13T07:29:29Z","message":"[inference] NL-to-ESQL: improve doc generation (#192378)\n\n## Summary\r\n\r\nFollow-up of https://github.com/elastic/kibana/pull/190433\r\n\r\nFix [#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n- Cleanup and refactor the documentation generation script\r\n- Make some tweak to the documentation to improve efficiency and make a\r\nbetter user of tokens\r\n- Perform human review of the generated content to make sure everything\r\nis accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","backport:prev-minor","v8.16.0","Team:AI Infra"],"title":"[inference] NL-to-ESQL: improve doc generation","number":192378,"url":"https://github.com/elastic/kibana/pull/192378","mergeCommit":{"message":"[inference] NL-to-ESQL: improve doc generation (#192378)\n\n## Summary\r\n\r\nFollow-up of https://github.com/elastic/kibana/pull/190433\r\n\r\nFix [#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n- Cleanup and refactor the documentation generation script\r\n- Make some tweak to the documentation to improve efficiency and make a\r\nbetter user of tokens\r\n- Perform human review of the generated content to make sure everything\r\nis accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192378","number":192378,"mergeCommit":{"message":"[inference] NL-to-ESQL: improve doc generation (#192378)\n\n## Summary\r\n\r\nFollow-up of https://github.com/elastic/kibana/pull/190433\r\n\r\nFix [#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n- Cleanup and refactor the documentation generation script\r\n- Make some tweak to the documentation to improve efficiency and make a\r\nbetter user of tokens\r\n- Perform human review of the generated content to make sure everything\r\nis accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Pierre Gayvallet <pierre.gayvallet@elastic.co>
Implements the NL-to-ESQL task and migrates the Observability AI Assistant to use the new task. Most of the files are simply generated documentation. I've also included two scripts: one to generate the documentation, and another to evaluate the task against a real LLM.
TBD: run evaluation framework in Observability AI Assistant to ensure there are no performance regressions.