Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Implement NL-to-ESQL task #190433

Merged
merged 43 commits into from
Sep 4, 2024
Merged

Conversation

dgieselaar
Copy link
Member

@dgieselaar dgieselaar commented Aug 13, 2024

Implements the NL-to-ESQL task and migrates the Observability AI Assistant to use the new task. Most of the files are simply generated documentation. I've also included two scripts: one to generate the documentation, and another to evaluate the task against a real LLM.

TBD: run evaluation framework in Observability AI Assistant to ensure there are no performance regressions.

@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@dgieselaar dgieselaar marked this pull request as ready for review August 13, 2024 17:32
@dgieselaar dgieselaar requested review from a team as code owners August 13, 2024 17:32
@dgieselaar dgieselaar added the release_note:skip Skip the PR/issue when compiling release notes label Aug 13, 2024
@botelastic botelastic bot added ci:project-deploy-observability Create an Observability project Team:Obs AI Assistant Observability AI Assistant labels Aug 13, 2024
@dgieselaar
Copy link
Member Author

@elasticmachine merge upstream

@dgieselaar
Copy link
Member Author

@pgayvallet is currently working on adding gemini/bedrock adapters, I'll put this back in draft until that work has completed.

@dgieselaar dgieselaar marked this pull request as draft August 28, 2024 08:34
@@ -8,7 +8,7 @@
import { filter, OperatorFunction } from 'rxjs';
import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionTokenCountEvent } from '.';

export function withoutTokenCountEvents<T extends ChatCompletionEvent>(): OperatorFunction<
export function withoutTokenCountEvents<T extends ChatCompletionEvent<any>>(): OperatorFunction<
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this shouldn't be needed

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in current state.

Left a few NITS, comments, and questions for follow-ups (on my side)

import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionMessageEvent } from '.';
import type { ToolOptions } from './tools';

export function isChatCompletionMessageEvent<T extends ToolOptions<string>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: isChatCompletionChunkEvent, isChatCompletionEvent and isChatCompletionMessageEvent could live in the same file (but i'll end up moving them myself anyway)

@@ -8,7 +8,7 @@
import { filter, OperatorFunction } from 'rxjs';
import { ChatCompletionEvent, ChatCompletionEventType, ChatCompletionTokenCountEvent } from '.';

export function withoutTokenCountEvents<T extends ChatCompletionEvent>(): OperatorFunction<
export function withoutTokenCountEvents<T extends ChatCompletionEvent<any>>(): OperatorFunction<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: would that work with unknown instead of any?

* 2.0.
*/

export {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? So you mean you're aware that index files are supposed to be used for re-exporting, not living alone in their folder containing concrete stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who'd have thought!

Comment on lines 19 to 20
messages: ensureMultiTurn([
...(messages || []),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really think we want that done at that level? We already have that done for Gemini at the adapter's level:

function messagesToGemini({ messages }: { messages: Message[] }): GeminiMessage[] {
return messages.map(messageToGeminiMapper()).reduce<GeminiMessage[]>((output, message) => {
// merging consecutive messages from the same user, as Gemini requires multi-turn messages
const previousMessage = output.length ? output[output.length - 1] : undefined;
if (previousMessage?.role === message.role) {
previousMessage.parts.push(...message.parts);
} else {
output.push(message);
}
return output;
}, []);
}

Which is cleaner, as it can work with the underlying format and append the things the right way instead of injecting - messages in the middle (that's not much noise, but still).

If other LLM services have the same limitation / restriction, I would do the patching at the adapter's level for them to, WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're probably right, I was trying to avoid the overhead of doing it three times, doesn't openai need it as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC no, openAI doesn't care about it.

Now regarding the issue about doing it multiple times: I went with the approach to do it at the adapter's level because I felt like doing it during the conversion to the underlying LLM's syntax was more powerful, but if we want to avoid doing it multiple time, we can do that check / transform in the chat API before calling the adapter.


import type { ToolOptions } from '../chat_complete/tools';

export function isOutputCompleteEvent<TId extends string, TToolOptions extends ToolOptions<string>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: same, isOutput****Event don't need their dedicated file for a line functions.

Comment on lines 60 to 67
evaluate: async (input, criteria) => {
const evaluation = await lastValueFrom(
outputApi('evaluate', {
connectorId,
system: `You are a helpful, respected assistant for evaluating task
inputs and outputs in the Elastic Platform.

Your goal is to verify whether the output of a task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, I'm afraid having a generic evaluation function for all task will be too naive. We already see with the NL-to-ESQL test suite that the evaluator gets bamboozled by the observed LLM because the evaluator has no knowledge of ESQL and no way to retrieve the documentation, so it don't spot invalid syntaxes or wrong parameters usages.

I'll be taking that one as a follow-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, def room for improvement here

Comment on lines 87 to 94
const buildTestDefinitions = (): Section[] => {
const testDefinitions: Section[] = [
{
title: 'ES|QL query generation',
tests: [
{
title: 'Generates a query to show the top 10 domains by doc count',
question: `For standard Elastic ECS compliant packetbeat data view (\`packetbeat-*\`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have 13 tests here, with given the very wide area of things we should be testing, is a very small number.

I'm not asking to add more in this PR, this is very fine for merging given it's just a port of the existing code from o11y, but I was wondering what the best way to add more tests and to cover more scenarios would be. Like, what was the initial strategy when those tests were added / how were they chosen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests have been around for a while and they're probably too focused on the Observability domain, would leave to see us add more. As far as how they were chosen, it's a bit of a mix between "I want the Assistant to be able to do this" and "I see the Assistant makes this mistake".


const builtDocsDir = Path.join(__dirname, '../../../../../../../built-docs');
const builtDocsDir = Path.join(__dirname, '../../../../../../built-docs');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: import { REPO_ROOT } from '@kbn/repo-info';


yargs(process.argv.slice(2))
.command(
'*',
'Extract ES|QL documentation for the Observability AI Assistant',
'Extract ES|QL documentation',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections about me moving the ESQL doc generation script to a package in a follow-up? it's more consistent with how we're been working with scripts in AppEx, and I like the isolation of concerns (especially given we will likely be adding more extraction scripts later, e.g. for the elastic.co doc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that's a good idea!

export type KibanaConfig = ReturnType<typeof readKibanaConfig>;

export const readKibanaConfig = () => {
const kibanaConfigDir = path.join(__filename, '../../../../../../config');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: REPO_ROOT

import { Message, MessageRole } from './chat_complete';

function isUserMessage(message: Message): boolean {
return message.role !== MessageRole.Assistant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MessageRole.Tool considered a user message then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a reply from the user's system.

Copy link
Contributor

@stephmilovic stephmilovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Dario, LGTM!

In a draft PR, I've implemented the new NL to ESQL task in a LangChain tool and ran it successfully with the Security solution's default assistant graph 🥳 . I will move forward once you merge: #192042

@kibana-ci
Copy link
Collaborator

kibana-ci commented Sep 4, 2024

💚 Build Succeeded

  • Buildkite Build
  • Commit: fe3596e
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-190433-fe3596ee99fd

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
inference 14 15 +1

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
inference 14 39 +25

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
inference 11 13 +2

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
inference 5.2KB 5.6KB +390.0B
Unknown metric groups

API count

id before after diff
inference 16 41 +25

ESLint disabled line counts

id before after diff
inference 0 1 +1

Total ESLint disabled count

id before after diff
inference 2 3 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@neptunian neptunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in the UI and LGTM!

@dgieselaar dgieselaar merged commit 5c298a1 into elastic:main Sep 4, 2024
23 checks passed
@dgieselaar dgieselaar deleted the nl-to-esql-task branch September 4, 2024 16:30
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Sep 4, 2024
pgayvallet added a commit that referenced this pull request Sep 13, 2024
## Summary

Follow-up of #190433

Fix [#192762](#192762)

- Cleanup and refactor the documentation generation script
- Make some tweak to the documentation to improve efficiency and make a
better user of tokens
- Perform human review of the generated content to make sure everything
is accurate

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 13, 2024
## Summary

Follow-up of elastic#190433

Fix [elastic#192762](elastic#192762)

- Cleanup and refactor the documentation generation script
- Make some tweak to the documentation to improve efficiency and make a
better user of tokens
- Perform human review of the generated content to make sure everything
is accurate

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 3226eb6)
kibanamachine referenced this pull request Sep 13, 2024
# Backport

This will backport the following commits from `main` to `8.x`:
- [[inference] NL-to-ESQL: improve doc generation
(#192378)](#192378)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Pierre
Gayvallet","email":"pierre.gayvallet@elastic.co"},"sourceCommit":{"committedDate":"2024-09-13T07:29:29Z","message":"[inference]
NL-to-ESQL: improve doc generation (#192378)\n\n##
Summary\r\n\r\nFollow-up of
https://github.com/elastic/kibana/pull/190433\r\n\r\nFix
[#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n-
Cleanup and refactor the documentation generation script\r\n- Make some
tweak to the documentation to improve efficiency and make a\r\nbetter
user of tokens\r\n- Perform human review of the generated content to
make sure everything\r\nis
accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","backport:prev-minor","v8.16.0","Team:AI
Infra"],"title":"[inference] NL-to-ESQL: improve doc
generation","number":192378,"url":"https://github.com/elastic/kibana/pull/192378","mergeCommit":{"message":"[inference]
NL-to-ESQL: improve doc generation (#192378)\n\n##
Summary\r\n\r\nFollow-up of
https://github.com/elastic/kibana/pull/190433\r\n\r\nFix
[#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n-
Cleanup and refactor the documentation generation script\r\n- Make some
tweak to the documentation to improve efficiency and make a\r\nbetter
user of tokens\r\n- Perform human review of the generated content to
make sure everything\r\nis
accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192378","number":192378,"mergeCommit":{"message":"[inference]
NL-to-ESQL: improve doc generation (#192378)\n\n##
Summary\r\n\r\nFollow-up of
https://github.com/elastic/kibana/pull/190433\r\n\r\nFix
[#192762](https://github.com/elastic/kibana/issues/192762)\r\n\r\n-
Cleanup and refactor the documentation generation script\r\n- Make some
tweak to the documentation to improve efficiency and make a\r\nbetter
user of tokens\r\n- Perform human review of the generated content to
make sure everything\r\nis
accurate\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"3226eb691af82882cdc89edd9ddff9abbcac1e5c"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Pierre Gayvallet <pierre.gayvallet@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes Team:Obs AI Assistant Observability AI Assistant v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants