From 89369838adac6fe9f780ea43829973ebfa4fce1c Mon Sep 17 00:00:00 2001 From: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Date: Fri, 13 Sep 2024 19:28:34 +1000 Subject: [PATCH] [8.x] [inference] NL-to-ESQL: improve doc generation (#192378) (#192802) # Backport This will backport the following commits from `main` to `8.x`: - [[inference] NL-to-ESQL: improve doc generation (#192378)](https://github.com/elastic/kibana/pull/192378) ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) Co-authored-by: Pierre Gayvallet --- .../evaluation/scenarios/esql/index.spec.ts | 10 +- .../load_esql_docs/extract_doc_entries.ts | 288 +++++++++++ .../load_esql_docs/extract_sections.ts | 41 -- .../scripts/load_esql_docs/generate_doc.ts | 146 ++++++ .../scripts/load_esql_docs/load_esql_docs.ts | 480 +++--------------- .../prompts/convert_to_markdown.ts | 46 ++ .../prompts/create_documentation_page.ts | 60 +++ .../scripts/load_esql_docs/prompts/index.ts | 10 + .../prompts/rewrite_function_page.ts | 238 +++++++++ .../load_esql_docs/sync_built_docs_repo.ts | 85 ++++ .../load_esql_docs/utils/output_executor.ts | 39 ++ .../tasks/nl_to_esql/esql_docs/esql-abs.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-acos.txt | 24 +- .../tasks/nl_to_esql/esql_docs/esql-asin.txt | 26 +- .../tasks/nl_to_esql/esql_docs/esql-atan.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-avg.txt | 22 +- .../nl_to_esql/esql_docs/esql-bucket.txt | 57 ++- .../tasks/nl_to_esql/esql_docs/esql-case.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-cbrt.txt | 23 +- .../tasks/nl_to_esql/esql_docs/esql-ceil.txt | 26 +- .../nl_to_esql/esql_docs/esql-cidr_match.txt | 28 +- .../nl_to_esql/esql_docs/esql-coalesce.txt | 30 +- .../nl_to_esql/esql_docs/esql-concat.txt | 36 +- .../tasks/nl_to_esql/esql_docs/esql-cos.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-cosh.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-count.txt | 26 +- .../esql_docs/esql-count_distinct.txt | 33 +- .../nl_to_esql/esql_docs/esql-date_diff.txt | 34 +- .../esql_docs/esql-date_extract.txt | 26 +- .../nl_to_esql/esql_docs/esql-date_format.txt | 29 +- .../nl_to_esql/esql_docs/esql-date_parse.txt | 23 +- .../nl_to_esql/esql_docs/esql-date_trunc.txt | 28 +- .../nl_to_esql/esql_docs/esql-dissect.txt | 36 +- .../tasks/nl_to_esql/esql_docs/esql-drop.txt | 34 +- .../tasks/nl_to_esql/esql_docs/esql-e.txt | 16 +- .../nl_to_esql/esql_docs/esql-ends_with.txt | 23 +- .../nl_to_esql/esql_docs/esql-enrich.txt | 50 +- .../tasks/nl_to_esql/esql_docs/esql-eval.txt | 53 +- .../tasks/nl_to_esql/esql_docs/esql-exp.txt | 19 +- .../tasks/nl_to_esql/esql_docs/esql-floor.txt | 24 +- .../tasks/nl_to_esql/esql_docs/esql-from.txt | 34 +- .../nl_to_esql/esql_docs/esql-greatest.txt | 27 +- .../tasks/nl_to_esql/esql_docs/esql-grok.txt | 46 +- .../nl_to_esql/esql_docs/esql-ip_prefix.txt | 26 +- .../tasks/nl_to_esql/esql_docs/esql-keep.txt | 37 +- .../tasks/nl_to_esql/esql_docs/esql-least.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-left.txt | 24 +- .../nl_to_esql/esql_docs/esql-length.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-limit.txt | 57 ++- .../nl_to_esql/esql_docs/esql-locate.txt | 29 +- .../tasks/nl_to_esql/esql_docs/esql-log.txt | 22 +- .../nl_to_esql/esql_docs/esql-lookup.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-ltrim.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-max.txt | 22 +- .../nl_to_esql/esql_docs/esql-median.txt | 28 +- .../esql-median_absolute_deviation.txt | 27 +- .../tasks/nl_to_esql/esql_docs/esql-min.txt | 22 +- .../nl_to_esql/esql_docs/esql-mv_append.txt | 16 +- .../nl_to_esql/esql_docs/esql-mv_avg.txt | 18 +- .../nl_to_esql/esql_docs/esql-mv_concat.txt | 26 +- .../nl_to_esql/esql_docs/esql-mv_count.txt | 18 +- .../nl_to_esql/esql_docs/esql-mv_dedupe.txt | 22 +- .../nl_to_esql/esql_docs/esql-mv_expand.txt | 33 +- .../nl_to_esql/esql_docs/esql-mv_first.txt | 22 +- .../nl_to_esql/esql_docs/esql-mv_last.txt | 24 +- .../nl_to_esql/esql_docs/esql-mv_max.txt | 22 +- .../nl_to_esql/esql_docs/esql-mv_median.txt | 20 +- .../nl_to_esql/esql_docs/esql-mv_min.txt | 18 +- .../esql-mv_pseries_weighted_sum.txt | 22 +- .../nl_to_esql/esql_docs/esql-mv_slice.txt | 26 +- .../nl_to_esql/esql_docs/esql-mv_sort.txt | 34 +- .../nl_to_esql/esql_docs/esql-mv_sum.txt | 18 +- .../nl_to_esql/esql_docs/esql-mv_zip.txt | 32 +- .../tasks/nl_to_esql/esql_docs/esql-now.txt | 16 +- .../nl_to_esql/esql_docs/esql-operators.txt | 207 ++++---- .../nl_to_esql/esql_docs/esql-overview.txt | 50 +- .../nl_to_esql/esql_docs/esql-percentile.txt | 31 +- .../tasks/nl_to_esql/esql_docs/esql-pi.txt | 16 +- .../tasks/nl_to_esql/esql_docs/esql-pow.txt | 22 +- .../nl_to_esql/esql_docs/esql-rename.txt | 44 +- .../nl_to_esql/esql_docs/esql-repeat.txt | 22 +- .../nl_to_esql/esql_docs/esql-replace.txt | 28 +- .../tasks/nl_to_esql/esql_docs/esql-right.txt | 24 +- .../tasks/nl_to_esql/esql_docs/esql-round.txt | 28 +- .../tasks/nl_to_esql/esql_docs/esql-row.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-rtrim.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-show.txt | 29 +- .../nl_to_esql/esql_docs/esql-signum.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-sin.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-sinh.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-sort.txt | 53 +- .../tasks/nl_to_esql/esql_docs/esql-split.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-sqrt.txt | 19 +- .../esql_docs/esql-st_centroid_agg.txt | 20 +- .../nl_to_esql/esql_docs/esql-st_contains.txt | 26 +- .../nl_to_esql/esql_docs/esql-st_disjoint.txt | 26 +- .../nl_to_esql/esql_docs/esql-st_distance.txt | 26 +- .../esql_docs/esql-st_intersects.txt | 22 +- .../nl_to_esql/esql_docs/esql-st_within.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-st_x.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-st_y.txt | 19 +- .../nl_to_esql/esql_docs/esql-starts_with.txt | 24 +- .../tasks/nl_to_esql/esql_docs/esql-stats.txt | 116 ++++- .../nl_to_esql/esql_docs/esql-substring.txt | 29 +- .../tasks/nl_to_esql/esql_docs/esql-sum.txt | 22 +- .../nl_to_esql/esql_docs/esql-syntax.txt | 133 +++-- .../tasks/nl_to_esql/esql_docs/esql-tan.txt | 20 +- .../tasks/nl_to_esql/esql_docs/esql-tanh.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-tau.txt | 16 +- .../nl_to_esql/esql_docs/esql-to_boolean.txt | 25 +- .../esql_docs/esql-to_cartesianpoint.txt | 18 +- .../esql_docs/esql-to_cartesianshape.txt | 23 +- .../nl_to_esql/esql_docs/esql-to_datetime.txt | 26 +- .../nl_to_esql/esql_docs/esql-to_degrees.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_double.txt | 23 +- .../nl_to_esql/esql_docs/esql-to_geopoint.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_geoshape.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_integer.txt | 24 +- .../tasks/nl_to_esql/esql_docs/esql-to_ip.txt | 19 +- .../nl_to_esql/esql_docs/esql-to_long.txt | 24 +- .../nl_to_esql/esql_docs/esql-to_lower.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_radians.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_string.txt | 20 +- .../esql_docs/esql-to_unsigned_long.txt | 29 +- .../nl_to_esql/esql_docs/esql-to_upper.txt | 18 +- .../nl_to_esql/esql_docs/esql-to_version.txt | 18 +- .../tasks/nl_to_esql/esql_docs/esql-top.txt | 28 +- .../tasks/nl_to_esql/esql_docs/esql-trim.txt | 18 +- .../nl_to_esql/esql_docs/esql-values.txt | 27 +- .../esql_docs/esql-weighted_avg.txt | 22 +- .../tasks/nl_to_esql/esql_docs/esql-where.txt | 40 +- .../server/tasks/nl_to_esql/index.ts | 18 +- .../tasks/nl_to_esql/system_message.txt | 152 +++--- 133 files changed, 3560 insertions(+), 1493 deletions(-) create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/extract_doc_entries.ts delete mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/extract_sections.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/generate_doc.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/prompts/convert_to_markdown.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/prompts/create_documentation_page.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/prompts/index.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/prompts/rewrite_function_page.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/sync_built_docs_repo.ts create mode 100644 x-pack/plugins/inference/scripts/load_esql_docs/utils/output_executor.ts diff --git a/x-pack/plugins/inference/scripts/evaluation/scenarios/esql/index.spec.ts b/x-pack/plugins/inference/scripts/evaluation/scenarios/esql/index.spec.ts index dffca52b10836..83868884e1429 100644 --- a/x-pack/plugins/inference/scripts/evaluation/scenarios/esql/index.spec.ts +++ b/x-pack/plugins/inference/scripts/evaluation/scenarios/esql/index.spec.ts @@ -192,7 +192,7 @@ const buildTestDefinitions = (): Section[] => { { title: 'Generates a query to show employees filtered by name and grouped by hire_date', question: `From the employees index, I want to see how many employees with a "B" in their first name - where hired each month over the past 2 years. + were hired each month over the past 2 years. Assume the following fields: - hire_date - first_name @@ -208,10 +208,10 @@ const buildTestDefinitions = (): Section[] => { (which can be read the same backward and forward), and then return their last name and first name - last_name - first_name`, - expected: `FROM employees - | EVAL reversed_last_name = REVERSE(last_name) - | WHERE TO_LOWER(last_name) == TO_LOWER(reversed_last_name) - | KEEP last_name, first_name`, + criteria: [ + `The assistant should not provide an ES|QL query, and explicitly mention that there is no + way to check for palindromes using ES|QL.`, + ], }, { title: 'Generates a query to show the top 10 domains by doc count', diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/extract_doc_entries.ts b/x-pack/plugins/inference/scripts/load_esql_docs/extract_doc_entries.ts new file mode 100644 index 0000000000000..4de5752c8e6b1 --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/extract_doc_entries.ts @@ -0,0 +1,288 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import Fs from 'fs/promises'; +import Path from 'path'; +import fastGlob from 'fast-glob'; +import $, { load, Cheerio, AnyNode } from 'cheerio'; +import { partition } from 'lodash'; +import { ToolingLog } from '@kbn/tooling-log'; +import pLimit from 'p-limit'; +import { ScriptInferenceClient } from '../util/kibana_client'; +import { convertToMarkdownPrompt } from './prompts/convert_to_markdown'; +import { bindOutput, PromptCaller } from './utils/output_executor'; + +/** + * The pages that will be extracted but only used as context + * for the LLM for the enhancement tasks of the documentation entries. + */ +const contextArticles = [ + 'esql.html', + 'esql-syntax.html', + 'esql-kibana.html', + 'esql-query-api.html', + 'esql-limitations.html', + 'esql-cross-clusters.html', + 'esql-examples.html', + 'esql-metadata-fields.html', + 'esql-multi-index.html', +]; + +interface ExtractedPage { + sourceFile: string; + name: string; + content: string; +} + +export interface ExtractedCommandOrFunc { + name: string; + markdownContent: string; + command: boolean; +} + +export interface ExtractionOutput { + commands: ExtractedCommandOrFunc[]; + functions: ExtractedCommandOrFunc[]; + pages: ExtractedPage[]; + skippedFile: string[]; +} + +export async function extractDocEntries({ + builtDocsDir, + log, + inferenceClient, +}: { + builtDocsDir: string; + log: ToolingLog; + inferenceClient: ScriptInferenceClient; +}): Promise { + const files = await fastGlob(`${builtDocsDir}/html/en/elasticsearch/reference/master/esql*.html`); + if (!files.length) { + throw new Error('No files found'); + } + + const output: ExtractionOutput = { + commands: [], + functions: [], + pages: [], + skippedFile: [], + }; + + const executePrompt = bindOutput({ + output: inferenceClient.output, + connectorId: inferenceClient.getConnectorId(), + }); + + const limiter = pLimit(10); + + await Promise.all( + files.map(async (file) => { + return await processFile({ + file, + log, + executePrompt, + output, + limiter, + }); + }) + ); + + return output; +} + +async function processFile({ + file: fileFullPath, + output, + executePrompt, + log, + limiter, +}: { + file: string; + output: ExtractionOutput; + executePrompt: PromptCaller; + log: ToolingLog; + limiter: pLimit.Limit; +}) { + const basename = Path.basename(fileFullPath); + const fileContent = (await Fs.readFile(fileFullPath)).toString('utf-8'); + + if (basename === 'esql-commands.html') { + // process commands + await processCommands({ + fileContent, + log, + output, + limiter, + executePrompt, + }); + } else if (basename === 'esql-functions-operators.html') { + // process functions / operators + await processFunctionsAndOperators({ + fileContent, + log, + output, + limiter, + executePrompt, + }); + } else if (contextArticles.includes(basename)) { + const $element = load(fileContent)('*'); + output.pages.push({ + sourceFile: basename, + name: basename === 'esql.html' ? 'overview' : basename.substring(5, basename.length - 5), + content: getSimpleText($element), + }); + } else { + output.skippedFile.push(basename); + } +} + +async function processFunctionsAndOperators({ + fileContent, + output, + executePrompt, + log, + limiter, +}: { + fileContent: string; + output: ExtractionOutput; + executePrompt: PromptCaller; + log: ToolingLog; + limiter: pLimit.Limit; +}) { + const $element = load(fileContent.toString())('*'); + + const sections = extractSections($element); + + const searches = [ + 'Binary operators', + 'Equality', + 'Inequality', + 'Less than', + 'Less than or equal to', + 'Greater than', + 'Greater than or equal to', + 'Add +', + 'Subtract -', + 'Multiply *', + 'Divide /', + 'Modulus %', + 'Unary operators', + 'Logical operators', + 'IS NULL and IS NOT NULL', + 'Cast (::)', + ]; + + const matches = ['IN', 'LIKE', 'RLIKE']; + + const [operatorSections, allOtherSections] = partition(sections, (section) => { + return ( + matches.includes(section.title) || + searches.some((search) => section.title.toLowerCase().startsWith(search.toLowerCase())) + ); + }); + + const functionSections = allOtherSections.filter(({ title }) => !!title.match(/^[A-Z_]+$/)); + + const markdownFiles = await Promise.all( + functionSections.map(async (section) => { + return limiter(async () => { + return { + name: section.title, + markdownContent: await executePrompt( + convertToMarkdownPrompt({ htmlContent: section.content }) + ), + command: false, + }; + }); + }) + ); + + output.functions.push(...markdownFiles); + + output.pages.push({ + sourceFile: 'esql-functions-operators.html', + name: 'operators', + content: operatorSections.map(({ title, content }) => `${title}\n${content}`).join('\n'), + }); +} + +async function processCommands({ + fileContent, + output, + executePrompt, + log, + limiter, +}: { + fileContent: string; + output: ExtractionOutput; + executePrompt: PromptCaller; + log: ToolingLog; + limiter: pLimit.Limit; +}) { + const $element = load(fileContent.toString())('*'); + + const sections = extractSections($element).filter(({ title }) => !!title.match(/^[A-Z_]+$/)); + + const markdownFiles = await Promise.all( + sections.map(async (section) => { + return limiter(async () => { + return { + name: section.title, + markdownContent: await executePrompt( + convertToMarkdownPrompt({ htmlContent: section.content }) + ), + command: true, + }; + }); + }) + ); + + output.commands.push(...markdownFiles); +} + +function getSimpleText($element: Cheerio) { + $element.remove('.navfooter'); + $element.remove('#sticky_content'); + $element.remove('.edit_me'); + $element.find('code').each(function () { + $(this).replaceWith('`' + $(this).text() + '`'); + }); + return $element + .find('.section,section,.part') + .last() + .text() + .replaceAll(/([\n]\s*){2,}/g, '\n'); +} + +export function extractSections(cheerio: Cheerio) { + const sections: Array<{ + title: string; + content: string; + }> = []; + cheerio.find('.section .position-relative').each((index, element) => { + const untilNextHeader = $(element).nextUntil('.position-relative'); + + const title = $(element).text().trim().replace('edit', ''); + + untilNextHeader.find('svg defs').remove(); + untilNextHeader.find('.console_code_copy').remove(); + untilNextHeader.find('.imageblock').remove(); + untilNextHeader.find('table').remove(); + + const htmlContent = untilNextHeader + .map((i, node) => $(node).prop('outerHTML')) + .toArray() + .join(''); + + sections.push({ + title: title === 'STATS ... BY' ? 'STATS' : title, + content: `

${title}

${htmlContent}
`, + }); + }); + + return sections; +} diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/extract_sections.ts b/x-pack/plugins/inference/scripts/load_esql_docs/extract_sections.ts deleted file mode 100644 index c4a2da3f355dd..0000000000000 --- a/x-pack/plugins/inference/scripts/load_esql_docs/extract_sections.ts +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one - * or more contributor license agreements. Licensed under the Elastic License - * 2.0; you may not use this file except in compliance with the Elastic License - * 2.0. - */ -import $, { AnyNode, Cheerio } from 'cheerio'; - -export function extractSections(cheerio: Cheerio) { - const sections: Array<{ - title: string; - content: string; - }> = []; - cheerio.find('.section h3').each((index, element) => { - let untilNextHeader = $(element).nextUntil('h3'); - - if (untilNextHeader.length === 0) { - untilNextHeader = $(element).parents('.titlepage').nextUntil('h3'); - } - - if (untilNextHeader.length === 0) { - untilNextHeader = $(element).parents('.titlepage').nextAll(); - } - - const title = $(element).text().trim().replace('edit', ''); - - untilNextHeader.find('table').remove(); - untilNextHeader.find('svg').remove(); - - const text = untilNextHeader.text(); - - const content = text.replaceAll(/([\n]\s*){2,}/g, '\n'); - - sections.push({ - title: title === 'STATS ... BY' ? 'STATS' : title, - content: `${title}\n\n${content}`, - }); - }); - - return sections; -} diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/generate_doc.ts b/x-pack/plugins/inference/scripts/load_esql_docs/generate_doc.ts new file mode 100644 index 0000000000000..2fe10d7ac4a83 --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/generate_doc.ts @@ -0,0 +1,146 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import pLimit from 'p-limit'; +import { ToolingLog } from '@kbn/tooling-log'; +import { ScriptInferenceClient } from '../util/kibana_client'; +import type { ExtractionOutput } from './extract_doc_entries'; +import { createDocumentationPagePrompt, rewriteFunctionPagePrompt } from './prompts'; +import { bindOutput } from './utils/output_executor'; + +export interface FileToWrite { + name: string; + content: string; +} + +interface PageGeneration { + outputFileName: string; + sourceFile: string; + instructions: string; +} + +export const generateDoc = async ({ + extraction, + inferenceClient, +}: { + extraction: ExtractionOutput; + inferenceClient: ScriptInferenceClient; + log: ToolingLog; +}) => { + const filesToWrite: FileToWrite[] = []; + + const limiter = pLimit(10); + + const callOutput = bindOutput({ + connectorId: inferenceClient.getConnectorId(), + output: inferenceClient.output, + }); + + const documentation = documentationForFunctionRewrite(extraction); + + await Promise.all( + [...extraction.commands, ...extraction.functions].map(async (func) => { + return limiter(async () => { + const rewrittenContent = await callOutput( + rewriteFunctionPagePrompt({ + content: func.markdownContent, + documentation, + command: func.command, + }) + ); + filesToWrite.push({ + name: fileNameForFunc(func.name), + content: rewrittenContent, + }); + }); + }) + ); + + const pageContentByName = (pageName: string) => + extraction.pages.find((page) => page.name === pageName)!.content; + + const pages: PageGeneration[] = [ + { + sourceFile: 'syntax', + outputFileName: 'esql-syntax.txt', + instructions: ` + Generate a description of Elastic ES|QL syntax. Make sure to reuse as much as possible the provided content of file and be as complete as possible. + For timespan literals, generate at least five examples of full ES|QL queries, using a mix commands and functions, using different intervals and units. + **Make sure you use timespan literals, such as \`1 day\` or \`24h\` or \`7 weeks\` in these examples**. + Combine ISO timestamps with time span literals and NOW(). + Make sure the example queries are using different combinations of syntax, commands and functions for each, and use BUCKET at least twice + When using DATE_TRUNC, make sure you DO NOT wrap the timespan in single or double quotes. + Do not use the Cast operator. In your examples, make sure to only use commands and functions that exist in the provided documentation. + `, + }, + { + sourceFile: 'overview', + outputFileName: 'esql-overview.txt', + instructions: `Generate a description of ES|QL as a language. Ignore links to other documents. + From Limitations, include the known limitations, but ignore limitations that are specific to a command. + Include a summary of what is mentioned in the CROSS_CLUSTER, Kibana and API sections. + Explain how to use the REST API with an example and mention important information for Kibana usage and cross cluster querying.`, + }, + { + sourceFile: 'operators', + outputFileName: 'esql-operators.txt', + instructions: ` + Generate a document describing the operators. + For each type of operator (binary, unary, logical, and the remaining), generate a section. + For each operator, generate at least one full ES|QL query as an example of its usage. + Keep it short, e.g. only a \`\`\`esql\\nFROM ...\\n| WHERE ... \`\`\` + `, + }, + ]; + + await Promise.all( + pages.map(async (page) => { + return limiter(async () => { + const pageContent = await callOutput( + createDocumentationPagePrompt({ + documentation, + content: pageContentByName(page.sourceFile), + specificInstructions: page.instructions, + }) + ); + filesToWrite.push({ + name: page.outputFileName, + content: pageContent, + }); + }); + }) + ); + + return filesToWrite; +}; + +const fileNameForFunc = (funcName: string) => + `esql-${funcName.replaceAll(' ', '-').toLowerCase()}.txt`; + +const documentationForFunctionRewrite = (extraction: ExtractionOutput) => { + return JSON.stringify( + { + pages: extraction.pages.filter((page) => { + return !['query-api', 'cross-clusters'].includes(page.name); + }), + commands: extraction.commands, + functions: extraction.functions.filter((func) => { + return [ + 'BUCKET', + 'COUNT', + 'COUNT_DISTINCT', + 'CASE', + 'DATE_EXTRACT', + 'DATE_DIFF', + 'DATE_TRUNC', + ].includes(func.name); + }), + }, + undefined, + 2 + ); +}; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/load_esql_docs.ts b/x-pack/plugins/inference/scripts/load_esql_docs/load_esql_docs.ts index 3250d06906905..a35491a476040 100644 --- a/x-pack/plugins/inference/scripts/load_esql_docs/load_esql_docs.ts +++ b/x-pack/plugins/inference/scripts/load_esql_docs/load_esql_docs.ts @@ -4,19 +4,13 @@ * 2.0; you may not use this file except in compliance with the Elastic License * 2.0. */ + import { run } from '@kbn/dev-cli-runner'; import { ESQLMessage, EditorError, getAstAndSyntaxErrors } from '@kbn/esql-ast'; import { validateQuery } from '@kbn/esql-validation-autocomplete'; -import $, { load } from 'cheerio'; -import { SingleBar } from 'cli-progress'; -import FastGlob from 'fast-glob'; import Fs from 'fs/promises'; -import { compact, once, partition } from 'lodash'; -import pLimit from 'p-limit'; import Path from 'path'; -import git, { SimpleGitProgressEvent } from 'simple-git'; import yargs, { Argv } from 'yargs'; -import { lastValueFrom } from 'rxjs'; import { REPO_ROOT } from '@kbn/repo-info'; import { INLINE_ESQL_QUERY_REGEX } from '../../common/tasks/nl_to_esql/constants'; import { correctCommonEsqlMistakes } from '../../common/tasks/nl_to_esql/correct_common_esql_mistakes'; @@ -24,7 +18,9 @@ import { connectorIdOption, elasticsearchOption, kibanaOption } from '../util/cl import { getServiceUrls } from '../util/get_service_urls'; import { KibanaClient } from '../util/kibana_client'; import { selectConnector } from '../util/select_connector'; -import { extractSections } from './extract_sections'; +import { syncBuiltDocs } from './sync_built_docs_repo'; +import { extractDocEntries } from './extract_doc_entries'; +import { generateDoc, FileToWrite } from './generate_doc'; yargs(process.argv.slice(2)) .command( @@ -38,16 +34,16 @@ yargs(process.argv.slice(2)) default: process.env.LOG_LEVEL || 'info', choices: ['info', 'debug', 'silent', 'verbose'], }) - .option('only', { - describe: 'Only regenerate these files', - string: true, - array: true, - }) .option('dryRun', { describe: 'Do not write or delete any files', boolean: true, default: false, }) + .option('syncDocs', { + describe: 'Sync doc repository before generation', + boolean: true, + default: true, + }) .option('kibana', kibanaOption) .option('elasticsearch', elasticsearchOption) .option('connectorId', connectorIdOption), @@ -63,431 +59,83 @@ yargs(process.argv.slice(2)) const kibanaClient = new KibanaClient(log, serviceUrls.kibanaUrl); const connectors = await kibanaClient.getConnectors(); - if (!connectors.length) { throw new Error('No connectors found'); } - const connector = await selectConnector({ connectors, preferredId: argv.connectorId, log, }); + log.info(`Using connector ${connector.connectorId}`); const chatClient = kibanaClient.createInferenceClient({ connectorId: connector.connectorId, }); - log.info(`Using connector ${connector.connectorId}`); - const builtDocsDir = Path.join(REPO_ROOT, '../built-docs'); + log.info(`Looking in ${builtDocsDir} for built-docs repository`); - log.debug(`Looking in ${builtDocsDir} for built-docs repository`); - - const dirExists = await Fs.stat(builtDocsDir); - - const getProgressHandler = () => { - let stage: string = ''; - let method: string = ''; - const loader: SingleBar = new SingleBar({ - barsize: 25, - format: `{phase} {bar} {percentage}%`, - }); - - const start = once(() => { - loader.start(100, 0, { phase: 'initializing' }); - }); - - return { - progress: (event: SimpleGitProgressEvent) => { - start(); - if (event.stage !== stage || event.method !== method) { - stage = event.stage; - method = event.method; - } - loader.update(event.progress, { phase: event.method + '/' + event.stage }); - }, - stop: () => loader.stop(), - }; - }; - - if (!dirExists) { - log.info('Cloning built-docs repo. This will take a while.'); - - const { progress, stop } = getProgressHandler(); - await git(Path.join(builtDocsDir, '..'), { - progress, - }).clone(`https://github.com/elastic/built-docs`, builtDocsDir, ['--depth', '1']); - - stop(); + if (argv.syncDocs) { + log.info(`Running sync for built-docs repository in ${builtDocsDir}...`); + await syncBuiltDocs({ builtDocsDir, log }); } - const { progress, stop } = getProgressHandler(); - - const builtDocsGit = git(builtDocsDir, { progress }); - - log.debug('Initializing simple-git'); - await builtDocsGit.init(); - - log.info('Making sure built-docs is up to date'); - await builtDocsGit.pull(); - - const files = FastGlob.sync( - `${builtDocsDir}/html/en/elasticsearch/reference/master/esql*.html` - ); - - if (!files) { - throw new Error('No files found'); - } - - const fsLimiter = pLimit(10); - - stop(); - - log.info(`Processing ${files.length} files`); - - async function extractContents( - file: string - ): Promise< - Array<{ title: string; content: string; instructions?: string; skip?: boolean }> - > { - const fileContents = await Fs.readFile(file); - const $element = load(fileContents.toString())('*'); - - function getSimpleText() { - $element.remove('.navfooter'); - $element.remove('#sticky_content'); - $element.find('code').each(function () { - $(this).replaceWith('`' + $(this).text() + '`'); - }); - return $element - .find('.section,section,.part') - .last() - .text() - .replaceAll(/([\n]\s*){2,}/g, '\n'); - } - - switch (Path.basename(file)) { - case 'esql-commands.html': - return extractSections($element) - .filter(({ title }) => !!title.match(/^[A-Z_]+$/)) - .map((doc) => ({ - ...doc, - instructions: `For this command, generate a Markdown document containing the following sections: - - ## {Title} - - {What this command does, the use cases, and any limitations from this document or esql-limitations.txt} - - ### Examples - - {example ES|QL queries using this command. prefer to copy mentioned queries, but make sure there are at least three different examples, focusing on different usages of this command}`, - })); - - case 'esql-limitations.html': - return [ - { - title: 'Limitations', - content: getSimpleText(), - skip: true, - }, - ]; - - case 'esql-syntax.html': - return [ - { - title: 'Syntax', - content: getSimpleText(), - instructions: `Generate a description of ES|QL syntax. Be as complete as possible. - For timespan literals, generate at least five examples of full ES|QL queries, using a mix commands and functions, using different intervals and units. - **Make sure you use timespan literals, such as \`1 day\` or \`24h\` or \`7 weeks\` in these examples**. - Combine ISO timestamps with time span literals and NOW(). - Make sure the example queries are using different combinations of syntax, commands and functions for each. - When using DATE_TRUNC, make sure you DO NOT wrap the timespan in single or double quotes. - Do not use the Cast operator. - `, - }, - ]; - - case 'esql.html': - return [ - { - title: 'Overview', - content: getSimpleText().replace( - /The ES\|QL documentation is organized in these sections(.*)$/, - '' - ), - instructions: `Generate a description of ES|QL as a language. Ignore links to other documents. From Limitations, include the known limitations, but ignore limitations that are specific to a command. - Include a summary of what is mentioned in the CROSS_CLUSTER, Kibana and API sections. Explain how to use the REST API with an example and mention important information for Kibana usage and cross cluster querying.`, - }, - ]; - - case 'esql-cross-clusters.html': - return [ - { - title: 'CROSS_CLUSTER', - content: getSimpleText(), - skip: true, - }, - ]; - - case 'esql-query-api.html': - return [ - { - title: 'API', - content: getSimpleText(), - skip: true, - }, - ]; - - case 'esql-kibana.html': - return [ - { - title: 'Kibana', - content: getSimpleText(), - skip: true, - }, - ]; - - case 'esql-functions-operators.html': - const sections = extractSections($element); - - const searches = [ - 'Binary operators', - 'Equality', - 'Inequality', - 'Less than', - 'Greater than', - 'Add +', - 'Subtract -', - 'Multiply *', - 'Divide /', - 'Modulus %', - 'Unary operators', - 'Logical operators', - 'IS NULL', - 'IS NOT NULL', - 'Cast (::)', - ]; + log.info(`Retrieving and converting documentation from ${builtDocsDir}...`); + const extraction = await extractDocEntries({ + builtDocsDir, + inferenceClient: chatClient, + log, + }); - const matches = ['IN', 'LIKE', 'RLIKE']; + log.info(`Rewriting documentation...`); + const docFiles = await generateDoc({ + extraction, + inferenceClient: chatClient, + log, + }); - const [operatorSections, allOtherSections] = partition(sections, (section) => { - return ( - matches.includes(section.title) || - searches.some((search) => - section.title.toLowerCase().startsWith(search.toLowerCase()) - ) + log.info(`Correcting common ESQL mistakes...`); + docFiles.forEach((docFile) => { + docFile.content = docFile.content.replaceAll( + INLINE_ESQL_QUERY_REGEX, + (match, query) => { + const correctionResult = correctCommonEsqlMistakes(query); + if (correctionResult.isCorrection) { + log.info( + `Corrected ES|QL, from:\n${correctionResult.input}\nto:\n${correctionResult.output}` ); - }); - - return allOtherSections - .map((section) => ({ - ...section, - instructions: `For each function, use the following template: - - ## {Title} - - {description of what this function does} - - ### Examples - - {at least two examples of full ES|QL queries. prefer the ones in the document verbatim} - `, - })) - .concat({ - title: 'Operators', - content: operatorSections - .map(({ title, content }) => `${title}\n${content}`) - .join('\n'), - instructions: - 'Generate a document describing the operators. For each type of operator (binary, unary, logical, and the remaining), generate a section. For each operator, generate at least one full ES|QL query as an example of its usage. Keep it short, e.g. only a ```esql\nFROM ...\n| WHERE ... ```', - }); - - default: - log.debug('Dropping file', file); - break; - } - return []; - } - - const documents = await Promise.all( - files.map((file) => fsLimiter(() => extractContents(file))) - ); - - const flattened = documents.flat().filter((doc) => { - // ES|QL aggregate functions, ES|QL mathematical functions, ES|QL string functions etc - const isOverviewArticle = - doc.title.startsWith('ES|QL') || - doc.title === 'Functions overview' || - doc.title === 'Operators overview'; - - if (isOverviewArticle) { - log.debug('Dropping overview article', doc.title); - } - return !isOverviewArticle; + } + return '```esql\n' + correctionResult.output + '\n```'; + } + ); }); const outDir = Path.join(__dirname, '../../server/tasks/nl_to_esql/esql_docs'); if (!argv.dryRun) { - log.info(`Writing ${flattened.length} documents to disk to ${outDir}`); - } - - if (!argv.only && !argv.dryRun) { - log.debug(`Clearing ${outDir}`); - - await Fs.readdir(outDir, { recursive: true }) - .then((filesInDir) => { - const limiter = pLimit(10); - return Promise.all(filesInDir.map((file) => limiter(() => Fs.unlink(file)))); - }) - .catch((error) => (error.code === 'ENOENT' ? Promise.resolve() : error)); - } + log.info(`Writing ${docFiles.length} documents to disk to ${outDir}`); - if (!argv.dryRun) { await Fs.mkdir(outDir).catch((error) => error.code === 'EEXIST' ? Promise.resolve() : error ); - } - const chatLimiter = pLimit(10); - - const allContent = flattened - .map((doc) => `## ${doc.title}\n\n${doc.content}\n\(end of ${doc.title})`) - .join('\n\n'); - - const allErrors: Array<{ - title: string; - fileName: string; - errors: Array<{ query: string; errors: Array }>; - }> = []; - - async function writeFile(doc: { title: string; content: string }) { - const fileName = Path.join( - outDir, - `esql-${doc.title.replaceAll(' ', '-').toLowerCase()}.txt` - ); - - doc.content = doc.content.replaceAll(INLINE_ESQL_QUERY_REGEX, (match, query) => { - const correctionResult = correctCommonEsqlMistakes(query); - if (correctionResult.isCorrection) { - log.info( - `Corrected ES|QL, from:\n${correctionResult.input}\nto:\n${correctionResult.output}` - ); - } - return '```esql\n' + correctionResult.output + '\n```'; - }); - - const queriesWithSyntaxErrors = compact( - await Promise.all( - Array.from(doc.content.matchAll(INLINE_ESQL_QUERY_REGEX)).map( - async ([match, query]) => { - const { errors, warnings } = await validateQuery(query, getAstAndSyntaxErrors, { - // setting this to true, we don't want to validate the index / fields existence - ignoreOnMissingCallbacks: true, - }); - const all = [...errors, ...warnings]; - if (all.length) { - log.warning( - `Error in ${fileName}:\n${JSON.stringify({ errors, warnings }, null, 2)}` - ); - return { - errors: all, - query, - }; - } - } - ) - ) + await Promise.all( + docFiles.map(async (file) => { + const fileName = Path.join(outDir, file.name); + await Fs.writeFile(fileName, file.content); + }) ); - - if (queriesWithSyntaxErrors.length) { - allErrors.push({ - title: doc.title, - fileName, - errors: queriesWithSyntaxErrors, - }); - } - - if (!argv.dryRun) { - await Fs.writeFile(fileName, doc.content); - } } - await Promise.all( - flattened.map(async (doc) => { - if (doc.skip || (argv.only && !argv.only.includes(doc.title))) { - return undefined; - } - - if (!doc.instructions) { - return fsLimiter(() => writeFile(doc)); - } - - return chatLimiter(async () => { - try { - const response = await lastValueFrom( - chatClient.output('generate_markdown', { - connectorId: chatClient.getConnectorId(), - system: `## System instructions - - Your job is to generate Markdown documentation off of content that is scraped from the Elasticsearch website. - - The documentation is about ES|QL, or the Elasticsearch Query Language, which is a new piped language that can be - used for loading, extracting and transforming data stored in Elasticsearch. The audience for the documentation - you generate, is intended for an LLM, to be able to answer questions about ES|QL or generate and execute ES|QL - queries. - - If you need to generate example queries, make sure they are different, in that they use different commands, and arguments, - to show case how a command, function or operator can be used in different ways. - - When you generate a complete ES|QL query, always wrap it in code blocks with the language being \`esql\`.. Here's an example: - - \`\`\`esql - FROM logs-* - | WHERE @timestamp <= NOW() - \`\`\` - - **If you are describing the syntax of a command, only wrap it in SINGLE backticks. - Leave out the esql part**. Eg: - ### Syntax: - - \`DISSECT input "pattern" [APPEND_SEPARATOR=""]\` - - #### Context - - These is the entire documentation, use it as context for answering questions - - ${allContent} - `, - input: `Generate Markdown for the following document: - - ## ${doc.title} - - ### Instructions - - ${doc.instructions} - - ### Content of file - - ${doc.content}`, - }) - ); - - return fsLimiter(() => - writeFile({ title: doc.title, content: response.content! }) - ); - } catch (error) { - log.error(`Error processing ${doc.title}: ${error.message}`); - } - }); - }) - ); + log.info(`Checking syntax...`); + const syntaxErrors = ( + await Promise.all(docFiles.map(async (file) => await findEsqlSyntaxError(file))) + ).flat(); log.warning( `Please verify the following queries that had syntax errors\n${JSON.stringify( - allErrors, + syntaxErrors, null, 2 )}` @@ -498,3 +146,31 @@ yargs(process.argv.slice(2)) } ) .parse(); + +interface SyntaxError { + query: string; + errors: Array; +} + +const findEsqlSyntaxError = async (doc: FileToWrite): Promise => { + return Array.from(doc.content.matchAll(INLINE_ESQL_QUERY_REGEX)).reduce( + async (listP, [match, query]) => { + const list = await listP; + const { errors, warnings } = await validateQuery(query, getAstAndSyntaxErrors, { + // setting this to true, we don't want to validate the index / fields existence + ignoreOnMissingCallbacks: true, + }); + + const all = [...errors, ...warnings]; + if (all.length) { + list.push({ + errors: all, + query, + }); + } + + return list; + }, + Promise.resolve([] as SyntaxError[]) + ); +}; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/prompts/convert_to_markdown.ts b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/convert_to_markdown.ts new file mode 100644 index 0000000000000..cef4a07fa712e --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/convert_to_markdown.ts @@ -0,0 +1,46 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import type { PromptTemplate } from '../utils/output_executor'; + +/** + * Prompt used to ask the LLM to convert a raw html content to markdown. + */ +export const convertToMarkdownPrompt: PromptTemplate<{ + htmlContent: string; +}> = ({ htmlContent }) => { + return { + system: ` + You are a helpful assistant specialized + in converting html fragment extracted from online documentation into equivalent Markdown documents. + + Please respond exclusively with the requested Markdown document, without + adding your thoughts or any non-markdown reply. + + - Ignore all links (just use their text content when relevant) + - Blockquotes (>) are not wanted, so don't generate any + - Use title2 (##) for the main title of the document + - Use title3 (###) for the section titles, such as "Syntax", "Parameters", "Examples" and so on. + - Use title4 (####) for subsections, such as parameter names or example titles + - HTML tables that are below code snippets are example of results. Please convert them to Markdown table + - for elements, only keep the text content of the underlying elements + + All the code snippets are for ESQL, so please use the following format for all snippets: + + \`\`\`esql + + \`\`\` + + `, + input: ` + Here is the html documentation to convert to markdown: + \`\`\`html + ${htmlContent} + \`\`\` + `, + }; +}; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/prompts/create_documentation_page.ts b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/create_documentation_page.ts new file mode 100644 index 0000000000000..228c96ef18b44 --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/create_documentation_page.ts @@ -0,0 +1,60 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import type { PromptTemplate } from '../utils/output_executor'; + +/** + * Prompt used to ask the LLM to create a documentation page from the provided content + */ +export const createDocumentationPagePrompt: PromptTemplate<{ + content: string; + documentation: string; + specificInstructions: string; +}> = ({ content, documentation, specificInstructions }) => { + return { + system: ` + You are a helpful assistant specialized in checking and improving technical documentation + about ES|QL, the new Query language from Elasticsearch written in Markdown format. + + Your job is to generate technical documentation in Markdown format based on content that is scraped from the Elasticsearch website. + + The documentation is about ES|QL, or the Elasticsearch Query Language, which is a new piped language that can be + used for loading, extracting and transforming data stored in Elasticsearch. The audience for the documentation + you generate, is intended for an LLM, to be able to answer questions about ES|QL or generate and execute ES|QL + queries. + + If you need to generate example queries, make sure they are different, in that they use different commands, and arguments, + to show case how a command, function or operator can be used in different ways. + + When you generate a complete ES|QL query, always wrap it in code blocks with the language being \`esql\`.. Here's an example: + + \`\`\`esql + FROM logs-* + | WHERE @timestamp <= NOW() + \`\`\` + + #### Context + + This is the entire documentation, in JSON format. Use it as context for answering questions + + \`\`\`json + ${documentation} + \`\`\` +`, + input: ` + ${specificInstructions} + + Use this document as main source to generate your markdown document: + + \`\`\`markdown + ${content} + \`\`\` + + But also add relevant content from the documentation you have access to. + `, + }; +}; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/prompts/index.ts b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/index.ts new file mode 100644 index 0000000000000..f5b54643fb3cb --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/index.ts @@ -0,0 +1,10 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +export { createDocumentationPagePrompt } from './create_documentation_page'; +export { rewriteFunctionPagePrompt } from './rewrite_function_page'; +export { convertToMarkdownPrompt } from './convert_to_markdown'; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/prompts/rewrite_function_page.ts b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/rewrite_function_page.ts new file mode 100644 index 0000000000000..230145b7a4135 --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/prompts/rewrite_function_page.ts @@ -0,0 +1,238 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import type { PromptTemplate } from '../utils/output_executor'; + +/** + * Prompt used to ask the LLM to improve a function or command page + */ +export const rewriteFunctionPagePrompt: PromptTemplate<{ + content: string; + documentation: string; + command: boolean; +}> = ({ content, documentation, command: isCommand }) => { + const entityName = isCommand ? 'command' : 'function'; + return { + system: ` + You are a helpful assistant specialized in rewriting technical documentation articles + about ES|QL, the new Query language from Elasticsearch written in Markdown format. + + An ES|QL query is composed of a source command followed by an optional + series of processing commands, separated by a pipe character: |. For + example: + + | + | + + An example of what an ES|QL query looks like: + + \`\`\`esql + FROM employees + | WHERE still_hired == true + | EVAL hired = DATE_FORMAT("YYYY", hire_date) + | STATS avg_salary = AVG(salary) BY languages + | EVAL avg_salary = ROUND(avg_salary) + | EVAL lang_code = TO_STRING(languages) + | ENRICH languages_policy ON lang_code WITH lang = language_name + | WHERE lang IS NOT NULL + | KEEP avg_salary, lang + | SORT avg_salary ASC + | LIMIT 3 + \`\`\` + + You will be given a technical documentation article about a specific ES|QL ${entityName}, + please rewrite it using the following template: + + \`\`\`markdown + # {title of the ${entityName}} + + {short description of what the ${entityName} does} + + ## Syntax + + {syntax used for the ${entityName}. Just re-use the content from the original article} + + ### Parameters + + {foreach parameters} + #### {parameter name} + + {if the parameter is optional, mention it. otherwise don't mention it's not optional} + + {short explanation of what the parameter does} + + {end foreach argument} + + ## Examples + + {list of examples from the source doc} + \`\`\` + + Additional instructions: + + - Follow the template, and DO NOT add any section, unless explicitly asked for in the instructions. + + - DO NOT modify the main title of the page, it must only be the command name, e.g. "## AVG" + + - Do NOT mention "ES|QL" in the description + - GOOD: "The AVG ${entityName} calculates [...]" + - BAD: "The AVG ${entityName} in ES|QL calculates [...]" + + - Move the description section at the beginning of the file (but remove the title). + - This means there is no longer a "Description" section after the "Parameters" one + + - For the "Syntax" section, if you need to escape code blocks, use single ticks and not triple ticks + - GOOD: \`AVG(number)\` + - BAD: \`\`\`AVG(number)\`\`\` + + - For the "Parameters" section + - if there is a description of the parameter in the source document, re-use it. Else, use your own words. + + - For the "Examples" section: + - Re-use as much as possible examples from the source document + - DO NOT modify the syntax of the examples. The syntax is correct, don't try to fix it. + - For each example, add a short, entity-dense sentence explaining what the example does. + - GOOD: "Calculate the average salary change" + - BAD: "Calculate the average salary change. This example uses the \`MV_AVG\` function to first average the multiple values per employee, and then uses the result with the \`AVG\` function:" + + - If any limitations impacting this ${entityName} are mentioned in this document or other ones, such + as the "esql-limitations.html" file, please add a "Limitations" section at the bottom of the file + and mention them. Otherwise, don't say or mention that there are no limitations. + + - When you generate a complete ES|QL query for the examples, always wrap it in code blocks + with the language being \`esql\`. + + An example of rewrite would be: + + Source: + + ///// + ${source} + ///// + + Output: + + ///// + ${output} + ///// + + + Please answer exclusively with the content of the output document, without any additional messages, + information, though or reasoning. DO NOT wrap the output with \`\`\`markdown. + + The full documentation, in JSON format: + \`\`\`json + ${documentation} + \`\`\` + + Please use it to search for limitations or additional information or examples when rewriting the article. + `, + input: ` + This is the technical document page you need to rewrite: + + \`\`\`markdown + ${content} + \`\`\` + `, + }; +}; + +const source = ` +## DISSECT + +DISSECT enables you to extract structured data out of a string. + +### Syntax + +\`\`\`esql +DISSECT input \"pattern\" [APPEND_SEPARATOR=\"\"] +\`\`\` + +### Parameters + +#### input + +The column that contains the string you want to structure. If the column has multiple values, DISSECT will process each value. + +#### pattern + +A dissect pattern. If a field name conflicts with an existing column, the existing column is dropped. If a field name is used more than once, only the rightmost duplicate creates a column. + +#### + +A string used as the separator between appended values, when using the append modifier. + +### Description + +DISSECT enables you to extract structured data out of a string. DISSECT matches the string against a delimiter-based pattern, and extracts the specified keys as columns. + +Refer to Process data with DISSECT for the syntax of dissect patterns. + +### Examples + +The following example parses a string that contains a timestamp, some text, and an IP address: + +\`\`\`esql +ROW a = \"2023-01-23T12:15:00.000Z - some text - 127.0.0.1\" +| DISSECT a \"%{date} - %{msg} - %{ip}\" +| KEEP date, msg, ip +\`\`\` + +By default, DISSECT outputs keyword string columns. To convert to another type, use Type conversion functions: + +\`\`\`esql +ROW a = \"2023-01-23T12:15:00.000Z - some text - 127.0.0.1\" +| DISSECT a \"%{date} - %{msg} - %{ip}\" +| KEEP date, msg, ip +| EVAL date = TO_DATETIME(date) +\`\`\` + +`; + +const output = ` + # DISSECT + +The DISSECT command is used to extract structured data from a string. +It matches the string against a delimiter-based pattern and extracts the specified keys as columns. + +## Syntax + +\`DISSECT input "pattern" [APPEND_SEPARATOR=""]\` + +### Parameters + +#### input + +The column containing the string you want to structure. If the column has multiple values, DISSECT will process each value. + +#### pattern + +A dissect pattern. If a field name conflicts with an existing column, the existing column is dropped. If a field name is used more than once, only the rightmost duplicate creates a column. + +#### + +A string used as the separator between appended values, when using the append modifier. + +## Examples + +The following example parses a string that contains a timestamp, some text, and an IP address: + +\`\`\`esql +ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" +| DISSECT a "%{date} - %{msg} - %{ip}" +| KEEP date, msg, ip +\`\`\` + +By default, DISSECT outputs keyword string columns. To convert to another type, use Type conversion functions: + +\`\`\`esql +ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" +| DISSECT a "%{date} - %{msg} - %{ip}" +| KEEP date, msg, ip +| EVAL date = TO_DATETIME(date) +\`\`\` +`; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/sync_built_docs_repo.ts b/x-pack/plugins/inference/scripts/load_esql_docs/sync_built_docs_repo.ts new file mode 100644 index 0000000000000..930d8ad2bf2af --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/sync_built_docs_repo.ts @@ -0,0 +1,85 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import Path from 'path'; +import Fs from 'fs/promises'; +import git, { SimpleGitProgressEvent } from 'simple-git'; +import { SingleBar } from 'cli-progress'; +import { once } from 'lodash'; +import { ToolingLog } from '@kbn/tooling-log'; + +export const syncBuiltDocs = async ({ + builtDocsDir, + log, +}: { + builtDocsDir: string; + log: ToolingLog; +}) => { + const dirExists = await exists(builtDocsDir); + + if (!dirExists) { + log.info('Cloning built-docs repo. This will take a while.'); + + const { progress, stop } = getProgressHandler(); + await git(Path.join(builtDocsDir, '..'), { + progress, + }).clone(`https://github.com/elastic/built-docs`, builtDocsDir, ['--depth', '1']); + + stop(); + } + + const { progress, stop } = getProgressHandler(); + + const builtDocsGit = git(builtDocsDir, { progress }); + + log.debug('Initializing simple-git'); + await builtDocsGit.init(); + + log.info('Making sure built-docs is up to date'); + await builtDocsGit.pull(); + + stop(); +}; + +const exists = async (path: string): Promise => { + let dirExists = true; + try { + await Fs.stat(path); + } catch (e) { + if (e.code === 'ENOENT') { + dirExists = false; + } else { + throw e; + } + } + return dirExists; +}; + +const getProgressHandler = () => { + let stage: string = ''; + let method: string = ''; + const loader: SingleBar = new SingleBar({ + barsize: 25, + format: `{phase} {bar} {percentage}%`, + }); + + const start = once(() => { + loader.start(100, 0, { phase: 'initializing' }); + }); + + return { + progress: (event: SimpleGitProgressEvent) => { + start(); + if (event.stage !== stage || event.method !== method) { + stage = event.stage; + method = event.method; + } + loader.update(event.progress, { phase: event.method + '/' + event.stage }); + }, + stop: () => loader.stop(), + }; +}; diff --git a/x-pack/plugins/inference/scripts/load_esql_docs/utils/output_executor.ts b/x-pack/plugins/inference/scripts/load_esql_docs/utils/output_executor.ts new file mode 100644 index 0000000000000..6697446f93cec --- /dev/null +++ b/x-pack/plugins/inference/scripts/load_esql_docs/utils/output_executor.ts @@ -0,0 +1,39 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import { lastValueFrom } from 'rxjs'; +import type { OutputAPI } from '../../../common/output'; + +export interface Prompt { + system?: string; + input: string; +} + +export type PromptTemplate = (input: Input) => Prompt; + +export type PromptCaller = (prompt: Prompt) => Promise; + +export type PromptCallerFactory = ({ + connectorId, + output, +}: { + connectorId: string; + output: OutputAPI; +}) => PromptCaller; + +export const bindOutput: PromptCallerFactory = ({ connectorId, output }) => { + return async ({ input, system }) => { + const response = await lastValueFrom( + output('', { + connectorId, + input, + system, + }) + ); + return response.content ?? ''; + }; +}; diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-abs.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-abs.txt index 6a970dc5700fe..0700d970972a4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-abs.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-abs.txt @@ -1,14 +1,28 @@ -## ABS +# ABS -The `ABS` function returns the absolute value of a numeric expression. If the input is null, the function returns null. +The ABS function returns the absolute value of a given number. -### Examples +## Syntax + +`ABS(number)` + +### Parameters + +#### number + +A numeric expression. If the parameter is `null`, the function will also return `null`. + +## Examples + +In this example, the ABS function is used to calculate the absolute value of -1.0: ```esql ROW number = -1.0 | EVAL abs_number = ABS(number) ``` +In the following example, the ABS function is used to calculate the absolute value of the height of employees: + ```esql FROM employees | KEEP first_name, last_name, height diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-acos.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-acos.txt index 3460483c15870..370e43bcf850f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-acos.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-acos.txt @@ -1,15 +1,27 @@ -## ACOS +# ACOS -The `ACOS` function returns the arccosine of a number as an angle, expressed in radians. The input number must be between -1 and 1. If the input is null, the function returns null. +The ACOS function returns the arccosine of a given number, expressed in radians. -### Examples +## Syntax + +`ACOS(number)` + +### Parameters + +#### number + +This is a number between -1 and 1. If the parameter is `null`, the function will also return `null`. + +## Examples + +In this example, the ACOS function calculates the arccosine of 0.9. ```esql -ROW a = .9 -| EVAL acos = ACOS(a) +ROW a=.9 +| EVAL acos=ACOS(a) ``` ```esql ROW b = -0.5 | EVAL acos_b = ACOS(b) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-asin.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-asin.txt index ad4fb8fe8d310..a7901b95b8931 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-asin.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-asin.txt @@ -1,15 +1,29 @@ -## ASIN +# ASIN -The `ASIN` function returns the arcsine of the input numeric expression as an angle, expressed in radians. +The ASIN function returns the arcsine of a given numeric expression as an angle, expressed in radians. -### Examples +## Syntax + +`ASIN(number)` + +### Parameters + +#### number + +This is a numeric value ranging between -1 and 1. If the parameter is `null`, the function will also return `null`. + +## Examples + +In this example, the ASIN function calculates the arcsine of 0.9: ```esql -ROW a = .9 -| EVAL asin = ASIN(a) +ROW a=.9 +| EVAL asin=ASIN(a) ``` +In this example, the ASIN function calculates the arcsine of -0.5: + ```esql ROW a = -.5 | EVAL asin = ASIN(a) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-atan.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-atan.txt index fbeee5e84f2f3..a8b6f3dfa547c 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-atan.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-atan.txt @@ -1,8 +1,18 @@ -## ATAN +# ATAN -The `ATAN` function returns the arctangent of the input numeric expression as an angle, expressed in radians. +The ATAN function returns the arctangent of a given numeric expression, expressed in radians. -### Examples +## Syntax + +`ATAN(number)` + +### Parameters + +#### number + +This is a numeric expression. If the parameter is `null`, the function will also return `null`. + +## Examples ```esql ROW a=12.9 @@ -12,4 +22,4 @@ ROW a=12.9 ```esql ROW x=5.0, y=3.0 | EVAL atan_yx = ATAN(y / x) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-avg.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-avg.txt index 943a12c4aaa90..b9d209b4ef3a1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-avg.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-avg.txt @@ -1,15 +1,29 @@ -## AVG +# AVG -The `AVG` function calculates the average of a numeric field. +The AVG function calculates the average of a numeric field. -### Examples +## Syntax + +`AVG(number)` + +### Parameters + +#### number + +The numeric field for which the average is calculated. + +## Examples + +Calculate the average height of employees: ```esql FROM employees | STATS AVG(height) ``` +The AVG function can be used with inline functions. For example: + ```esql FROM employees | STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-bucket.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-bucket.txt index 945a4328d7728..585a0321ef818 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-bucket.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-bucket.txt @@ -1,8 +1,34 @@ -## BUCKET +# BUCKET -The `BUCKET` function creates groups of values—buckets—out of a datetime or numeric input. The size of the buckets can either be provided directly or chosen based on a recommended count and values range. +The BUCKET function allows you to create groups of values, known as buckets, from a datetime or numeric input. The size of the buckets can be specified directly or determined based on a recommended count and values range. -### Examples +## Syntax + +`BUCKET(field, buckets, from, to)` + +### Parameters + +#### field + +A numeric or date expression from which to derive buckets. + +#### buckets + +The target number of buckets, or the desired bucket size if `from` and `to` parameters are omitted. + +#### from + +The start of the range. This can be a number, a date, or a date expressed as a string. + +#### to + +The end of the range. This can be a number, a date, or a date expressed as a string. + +## Examples + +BUCKET can operate in two modes: one where the bucket size is computed based on a bucket count recommendation and a range, and another where the bucket size is provided directly. + +For instance, asking for at most 20 buckets over a year results in monthly buckets: ```esql FROM employees @@ -11,33 +37,34 @@ FROM employees | SORT hire_date ``` -```esql -FROM employees -| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" -| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z") -| SORT month -``` +If the desired bucket size is known in advance, simply provide it as the second argument, leaving the range out: ```esql FROM employees | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" -| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z") +| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week) | SORT week ``` +BUCKET can also operate on numeric fields. For example, to create a salary histogram: + ```esql FROM employees -| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" -| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week) -| SORT week +| STATS COUNT(*) BY bs = BUCKET(salary, 20, 25324, 74999) +| SORT bs ``` +BUCKET may be used in both the aggregating and grouping part of the STATS ... BY ... command provided that in the aggregating part the function is referenced by an alias defined in the grouping part, or that it is invoked with the exact same expression: + ```esql FROM employees -| STATS COUNT(*) BY bs = BUCKET(salary, 20, 25324, 74999) -| SORT bs +| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.) +| SORT b1, b2 +| KEEP s1, b1, s2, b2 ``` +More examples: + ```esql FROM employees | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-case.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-case.txt index 4c9cc07e669db..110f0ee1a242b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-case.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-case.txt @@ -1,8 +1,22 @@ -## CASE +# CASE -The `CASE` function accepts pairs of conditions and values. The function returns the value that belongs to the first condition that evaluates to true. If the number of arguments is odd, the last argument is the default value which is returned when no condition matches. If the number of arguments is even, and no condition matches, the function returns null. +The CASE function accepts pairs of conditions and values. It returns the value that corresponds to the first condition that evaluates to `true`. If no condition matches, the function returns a default value or `null` if the number of arguments is even. -### Examples +## Syntax + +`CASE(condition, trueValue)` + +### Parameters + +#### condition + +A condition to evaluate. + +#### trueValue + +The value that is returned when the corresponding condition is the first to evaluate to `true`. If no condition matches, the default value is returned. + +## Examples Determine whether employees are monolingual, bilingual, or polyglot: @@ -32,6 +46,6 @@ Calculate an hourly error rate as a percentage of the total number of log messag FROM sample_data | EVAL error = CASE(message LIKE "*error*", 1, 0) | EVAL hour = DATE_TRUNC(1 hour, @timestamp) -| STATS error_rate = AVG(error) BY hour +| STATS error_rate = AVG(error) by hour | SORT hour ``` \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cbrt.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cbrt.txt index 44ecddefc290d..6bbff254f2b16 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cbrt.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cbrt.txt @@ -1,15 +1,20 @@ -## CBRT +# CBRT -The `CBRT` function returns the cube root of a number. The input can be any numeric value, and the return value is always a double. Cube roots of infinities are null. +The CBRT function calculates the cube root of a given number. -### Examples +## Syntax + +`CBRT(number)` + +### Parameters + +#### number + +This is a numeric expression. If the parameter is `null`, the function will also return `null`. + +## Examples ```esql ROW d = 1000.0 -| EVAL c = CBRT(d) +| EVAL c = cbrt(d) ``` - -```esql -ROW value = 27.0 -| EVAL cube_root = CBRT(value) -``` \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ceil.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ceil.txt index 3713fa2cf4cba..438d02cb6646d 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ceil.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ceil.txt @@ -1,16 +1,24 @@ -## CEIL +# CEIL -The `CEIL` function rounds a number up to the nearest integer. This operation is a no-op for long (including unsigned) and integer types. For double types, it picks the closest double value to the integer, similar to `Math.ceil`. +The CEIL function rounds a number up to the nearest integer. -### Examples +## Syntax + +`CEIL(number)` + +### Parameters + +#### number + +This is a numeric expression. If the parameter is `null`, the function will also return `null`. + +## Examples ```esql ROW a=1.8 -| EVAL a = CEIL(a) +| EVAL a=CEIL(a) ``` -```esql -FROM employees -| KEEP first_name, last_name, height -| EVAL height_ceil = CEIL(height) -``` \ No newline at end of file +## Limitations + +- the CEIL function does not perform any operation for `long` (including unsigned) and `integer` types. For `double` type, it picks the closest `double` value to the integer, similar to the Math.ceil function in other programming languages. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cidr_match.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cidr_match.txt index 2e5e306d01c01..2dcb0ec9b6824 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cidr_match.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cidr_match.txt @@ -1,17 +1,35 @@ -## CIDR_MATCH +# CIDR_MATCH -The `CIDR_MATCH` function returns true if the provided IP is contained in one of the provided CIDR blocks. +The CIDR_MATCH function checks if a given IP address falls within one or more specified CIDR blocks. -### Examples +## Syntax + +`CIDR_MATCH(ip, blockX)` + +### Parameters + +#### ip + +The IP address to be checked. This function supports both IPv4 and IPv6 addresses. + +#### blockX + +The CIDR block(s) against which the IP address is to be checked. + +## Examples + +The following example checks if the IP address 'ip1' falls within the CIDR blocks "127.0.0.2/32": ```esql FROM hosts -| WHERE CIDR_MATCH(ip1, "127.0.0.2/32", "127.0.0.3/32") +| WHERE CIDR_MATCH(ip1, "127.0.0.2/32") | KEEP card, host, ip0, ip1 ``` +The function also supports passing multiple blockX: + ```esql FROM network_logs | WHERE CIDR_MATCH(source_ip, "192.168.1.0/24", "10.0.0.0/8") | KEEP timestamp, source_ip, destination_ip, action -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-coalesce.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-coalesce.txt index 057efa96da3bd..ac0162fae33e6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-coalesce.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-coalesce.txt @@ -1,15 +1,37 @@ -## COALESCE +# COALESCE -The `COALESCE` function returns the first of its arguments that is not null. If all arguments are null, it returns null. +The COALESCE function returns the first non-null argument from the list of provided arguments. -### Examples +## Syntax + +`COALESCE(first, rest)` + +### Parameters + +#### first + +The first expression to evaluate. + +#### rest + +The subsequent expressions to evaluate. + +### Description + +The COALESCE function evaluates the provided expressions in order and returns the first non-null value it encounters. If all the expressions evaluate to null, the function returns null. + +## Examples + +In the following example, the COALESCE function evaluates the expressions 'a' and 'b'. Since 'a' is null, the function returns the value of 'b'. ```esql ROW a=null, b="b" | EVAL COALESCE(a, b) ``` +COALESCE supports any number of rest parameters: + ```esql ROW x=null, y=null, z="z" | EVAL first_non_null = COALESCE(x, y, z) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-concat.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-concat.txt index 435a8458ff05c..ca464bc74d0c6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-concat.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-concat.txt @@ -1,16 +1,36 @@ -## CONCAT +# CONCAT -The `CONCAT` function concatenates two or more strings. +The CONCAT function combines two or more strings into one. -### Examples +## Syntax + +`CONCAT(string1, string2, [...stringN])` + +### Parameters + +#### string1 + +The first string to concatenate. + +#### string2 + +The second string to concatenate. + +## Examples + +The following example concatenates the `street_1` and `street_2` fields: + +```esql +FROM address +| KEEP street_1, street_2 +| EVAL fullstreet = CONCAT(street_1, street_2) +``` + + +CONCAT supports any number of string parameters. The following example concatenates the `first_name` and `last_name` fields with a space in between: ```esql FROM employees | KEEP first_name, last_name | EVAL fullname = CONCAT(first_name, " ", last_name) ``` - -```esql -ROW part1 = "Hello", part2 = "World" -| EVAL greeting = CONCAT(part1, " ", part2) -``` \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cos.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cos.txt index e554a886c5cab..519af24229150 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cos.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cos.txt @@ -1,15 +1,25 @@ -## COS +# COS -The `COS` function returns the cosine of an angle, expressed in radians. If the input angle is null, the function returns null. +The COS function calculates the cosine of a given angle. -### Examples +## Syntax + +`COS(angle)` + +### Parameters + +#### angle + +The angle for which the cosine is to be calculated, expressed in radians. If the parameter is `null`, the function will return `null`. + +## Examples ```esql ROW a=1.8 -| EVAL cos = COS(a) +| EVAL cos=COS(a) ``` ```esql ROW angle=0.5 | EVAL cosine_value = COS(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cosh.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cosh.txt index c1eda78d10f2b..ec9e8906a9467 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cosh.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-cosh.txt @@ -1,15 +1,25 @@ -## COSH +# COSH -Returns the hyperbolic cosine of an angle. +The COSH function calculates the hyperbolic cosine of a given angle. -### Examples +## Syntax + +`COSH(angle)` + +### Parameters + +#### angle + +The angle in radians for which the hyperbolic cosine is to be calculated. If the angle is null, the function will return null. + +## Examples ```esql ROW a=1.8 -| EVAL cosh = COSH(a) +| EVAL cosh=COSH(a) ``` ```esql ROW angle=0.5 | EVAL hyperbolic_cosine = COSH(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count.txt index 407caa4c0f0c6..dace14b709204 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count.txt @@ -1,31 +1,51 @@ -## COUNT +# COUNT -The `COUNT` function returns the total number (count) of input values. If the `field` parameter is omitted, it is equivalent to `COUNT(*)`, which counts the number of rows. +The COUNT function returns the total number of input values. -### Examples +## Syntax + +`COUNT(field)` + +### Parameters + +#### field + +This is an expression that outputs values to be counted. If it's omitted, it's equivalent to `COUNT(*)`, which counts the number of rows. + +## Examples + +Count the number of specific field values: ```esql FROM employees | STATS COUNT(height) ``` +Count the number of rows using `COUNT()` or `COUNT(*)`: + ```esql FROM employees | STATS count = COUNT(*) BY languages | SORT languages DESC ``` +The expression can use inline functions. In this example, a string is split into multiple values using the `SPLIT` function, and the values are counted: + ```esql ROW words="foo;bar;baz;qux;quux;foo" | STATS word_count = COUNT(SPLIT(words, ";")) ``` +To count the number of times an expression returns `TRUE`, use a `WHERE` command to remove rows that shouldn’t be included: + ```esql ROW n=1 | WHERE n < 0 | STATS COUNT(n) ``` +To count the same stream of data based on two different expressions, use the pattern `COUNT( OR NULL)`: + ```esql ROW n=1 | STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL) diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count_distinct.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count_distinct.txt index ec7c373e340be..f6918b6651562 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count_distinct.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-count_distinct.txt @@ -1,31 +1,46 @@ -## COUNT_DISTINCT +# COUNT_DISTINCT -The `COUNT_DISTINCT` function returns the approximate number of distinct values in a column or literal. It uses the HyperLogLog++ algorithm to count based on the hashes of the values, providing configurable precision to trade memory for accuracy. This function is particularly useful for high-cardinality sets and large values, as it maintains fixed memory usage regardless of the number of unique values. +The COUNT_DISTINCT function calculates the approximate number of distinct values in a specified field. -### Examples +## Syntax + +`COUNT_DISTINCT(field, precision)` + +### Parameters + +#### field + +The column or literal for which to count the number of distinct values. + +#### precision + +(Optional) The precision threshold. The counts are approximate. The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000. + +## Examples + +The following example calculates the number of distinct values in the `ip0` and `ip1` fields: ```esql FROM hosts | STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1) ``` +You can also specify a precision threshold. In the following example, the precision threshold for `ip0` is set to 80000 and for `ip1` to 5: + ```esql FROM hosts | STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5) ``` +The COUNT_DISTINCT function can also be used with inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values: + ```esql ROW words="foo;bar;baz;qux;quux;foo" | STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";")) ``` -### Additional Information - -- **Precision Threshold**: The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The maximum supported value is 40000, and the default value is 3000. This threshold allows you to trade memory for accuracy, defining a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. -- **Algorithm**: The function is based on the HyperLogLog++ algorithm, which provides excellent accuracy on low-cardinality sets and fixed memory usage. The memory usage depends on the configured precision, requiring about `c * 8` bytes for a precision threshold of `c`. - ### Notes - Computing exact counts requires loading values into a set and returning its size, which doesn't scale well for high-cardinality sets or large values due to memory usage and communication overhead. - The HyperLogLog++ algorithm's accuracy depends on the leading zeros of hashed values, and the exact distributions of hashes in a dataset can affect the accuracy of the cardinality. -- Even with a low threshold, the error remains very low (1-6%) even when counting millions of items. \ No newline at end of file +- Even with a low threshold, the error remains very low (1-6%) even when counting millions of items. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_diff.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_diff.txt index 20a261e53a100..7c0652aa4c067 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_diff.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_diff.txt @@ -1,8 +1,28 @@ -## DATE_DIFF +# DATE_DIFF -The `DATE_DIFF` function subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of the specified unit. If `startTimestamp` is later than the `endTimestamp`, negative values are returned. Note that while there is an overlap between the function’s supported units and ES|QL’s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by Elasticsearch. +The DATE_DIFF function calculates the difference between two timestamps and returns the difference in multiples of the specified `unit`. -### Examples +## Syntax + +`DATE_DIFF(unit, startTimestamp, endTimestamp)` + +### Parameters + +#### unit + +The unit of time in which the difference will be calculated. + +#### startTimestamp + +The starting timestamp for the calculation. + +#### endTimestamp + +The ending timestamp for the calculation. + +## Examples + +The following example demonstrates how to use the DATE_DIFF function to calculate the difference between two timestamps in microseconds: ```esql ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z") @@ -12,4 +32,10 @@ ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-1 ```esql ROW date1 = TO_DATETIME("2023-01-01T00:00:00.000Z"), date2 = TO_DATETIME("2023-12-31T23:59:59.999Z") | EVAL dd_days = DATE_DIFF("days", date1, date2) -``` \ No newline at end of file +``` + +## Notes + +- If the `startTimestamp` is later than the `endTimestamp`, the function will return a negative value. + +- It's important to note that while there is some overlap between the units supported by this function and ESQL's time span literals, these sets are not interchangeable. Also, the abbreviations supported by this function are shared with other established products and may not align with the date-time nomenclature used by Elasticsearch. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_extract.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_extract.txt index e064e1e09a91b..fa2cf8c0c88a6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_extract.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_extract.txt @@ -1,15 +1,33 @@ -## DATE_EXTRACT +# DATE_EXTRACT -The `DATE_EXTRACT` function extracts specific parts of a date, such as the year, month, day, or hour. It can be used to retrieve various components of a date based on the specified `datePart`. +The DATE_EXTRACT function is used to extract specific parts of a date. -### Examples +## Syntax + +`DATE_EXTRACT(datePart, date)` + +### Parameters + +#### datePart + +This is the part of the date you want to extract, such as "year", "month" or ""hour_of_day". + +#### date + +This is the date expression. + +## Examples + +To extract the year from a date: ```esql ROW date = DATE_PARSE("yyyy-MM-dd", "2022-05-06") | EVAL year = DATE_EXTRACT("year", date) ``` +To find all events that occurred outside of business hours (before 9 AM or after 5PM), on any given date: + ```esql FROM sample_data | WHERE DATE_EXTRACT("hour_of_day", @timestamp) < 9 AND DATE_EXTRACT("hour_of_day", @timestamp) >= 17 -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_format.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_format.txt index 26149e8ce0d28..4b8a8a174ab80 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_format.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_format.txt @@ -1,17 +1,28 @@ -## DATE_FORMAT +# DATE_FORMAT -The `DATE_FORMAT` function returns a string representation of a date in the provided format. If no format is specified, the default format `yyyy-MM-dd'T'HH:mm:ss.SSSZ` is used. If the date expression is null, the function returns null. +The DATE_FORMAT function returns a string representation of a date, formatted according to the provided format. -### Examples +## Syntax + +`DATE_FORMAT(dateFormat, date)` + +### Parameters + +#### dateFormat + +This is an optional parameter that specifies the desired date format. +If no format is provided, the function defaults to the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format. + +#### date + +This is the date expression that you want to format. + +## Examples + +In this example, the `hire_date` field is formatted according to the "YYYY-MM-dd" format, and the result is stored in the `hired` field: ```esql FROM employees | KEEP first_name, last_name, hire_date | EVAL hired = DATE_FORMAT("YYYY-MM-dd", hire_date) ``` - -```esql -FROM employees -| KEEP first_name, last_name, hire_date -| EVAL hired = DATE_FORMAT("yyyy/MM/dd", hire_date) -``` \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_parse.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_parse.txt index 4d2843deed440..f62cf0c5f9a4c 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_parse.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_parse.txt @@ -1,15 +1,28 @@ -## DATE_PARSE +# DATE_PARSE -The `DATE_PARSE` function returns a date by parsing the second argument using the format specified in the first argument. +The DATE_PARSE function is used to convert a date string into a date format based on the provided pattern. -### Examples +## Syntax + +`DATE_PARSE(datePattern, dateString)` + +### Parameters + +#### datePattern + +This is the format of the date. If `null` is provided, the function will return `null`. + +#### dateString + +This is the date expression in string format. + +## Examples ```esql ROW date_string = "2022-05-06" | EVAL date = DATE_PARSE("yyyy-MM-dd", date_string) ``` -```esql ROW date_string = "2023-12-25" | EVAL date = DATE_PARSE("yyyy-MM-dd", date_string) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_trunc.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_trunc.txt index 28c15f62c5c53..bd1d4b68043b1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_trunc.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-date_trunc.txt @@ -1,8 +1,24 @@ -## DATE_TRUNC +# DATE_TRUNC -The `DATE_TRUNC` function rounds down a date to the closest interval. +The DATE_TRUNC function rounds down a date to the nearest specified interval. -### Examples +## Syntax + +`DATE_TRUNC(interval, date)` + +### Parameters + +#### interval + +This is the interval to which the date will be rounded down. It is expressed using the timespan literal syntax. + +#### date + +This is the date expression that will be rounded down. + +## Examples + +The following example rounds down the hire_date to the nearest year: ```esql FROM employees @@ -10,7 +26,7 @@ FROM employees | EVAL year_hired = DATE_TRUNC(1 year, hire_date) ``` -Combine `DATE_TRUNC` with `STATS ... BY` to create date histograms. For example, the number of hires per year: +You can combine DATE_TRUNC with STATS ... BY to create date histograms. For example, the number of hires per year: ```esql FROM employees @@ -19,7 +35,7 @@ FROM employees | SORT year ``` -Or an hourly error rate: +Or, you can calculate an hourly error rate: ```esql FROM sample_data @@ -27,4 +43,4 @@ FROM sample_data | EVAL hour = DATE_TRUNC(1 hour, @timestamp) | STATS error_rate = AVG(error) BY hour | SORT hour -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-dissect.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-dissect.txt index 5ce173f0e801d..8f4a822e52f07 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-dissect.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-dissect.txt @@ -1,29 +1,32 @@ -## DISSECT +# DISSECT -DISSECT enables you to extract structured data out of a string. It matches the string against a delimiter-based pattern and extracts the specified keys as columns. This command is particularly useful for parsing log files, structured text, or any other string data where fields are separated by specific delimiters. +The DISSECT command is used to extract structured data from a string. It matches the string against a delimiter-based pattern and extracts the specified keys as columns. ### Use Cases - **Log Parsing**: Extracting timestamps, log levels, and messages from log entries. - **Data Transformation**: Converting unstructured text data into structured columns for further analysis. - **Data Cleaning**: Removing or reformatting specific parts of a string to make the data more usable. -### Limitations -- If a field name conflicts with an existing column, the existing column is dropped. -- If a field name is used more than once, only the rightmost duplicate creates a column. -- DISSECT does not support reference keys. - -### Syntax +## Syntax `DISSECT input "pattern" [APPEND_SEPARATOR=""]` ### Parameters -- **input**: The column that contains the string you want to structure. If the column has multiple values, DISSECT will process each value. -- **pattern**: A dissect pattern. -- ****: A string used as the separator between appended values, when using the append modifier. -### Examples +#### input + +The column containing the string you want to structure. If the column has multiple values, DISSECT will process each value. + +#### pattern + +A dissect pattern. If a field name conflicts with an existing column, the existing column is dropped. If a field name is used more than once, only the rightmost duplicate creates a column. + +#### + +A string used as the separator between appended values, when using the append modifier. + +## Examples -#### Example 1: Basic Usage The following example parses a string that contains a timestamp, some text, and an IP address: ```esql @@ -32,7 +35,6 @@ ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" | KEEP date, msg, ip ``` -#### Example 2: Type Conversion By default, DISSECT outputs keyword string columns. To convert to another type, use Type conversion functions: ```esql @@ -42,7 +44,6 @@ ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" | EVAL date = TO_DATETIME(date) ``` -#### Example 3: Using Append Separator In this example, we use the `APPEND_SEPARATOR` to concatenate values with a custom separator: ```esql @@ -51,4 +52,7 @@ ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" | KEEP date, msg, ip ``` -These examples showcase different ways to use the DISSECT command to parse and transform string data in Elasticsearch. \ No newline at end of file +### Limitations +- If a field name conflicts with an existing column, the existing column is dropped. +- If a field name is used more than once, only the rightmost duplicate creates a column. +- DISSECT does not support reference keys. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-drop.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-drop.txt index 9bc678ef29c2f..2e36f20474aef 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-drop.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-drop.txt @@ -1,35 +1,33 @@ -## DROP +# DROP -The `DROP` processing command in ES|QL is used to remove one or more columns from the result set. This command is particularly useful when you want to exclude certain fields from your query results, either to simplify the output or to reduce the amount of data being processed and transferred. The `DROP` command supports the use of wildcards, allowing you to remove multiple columns that match a specific pattern. +The DROP command is used to eliminate one or more columns from the data. -### Use Cases -- **Simplifying Output:** Remove unnecessary columns to make the result set easier to read and analyze. -- **Data Reduction:** Exclude large or irrelevant fields to reduce the amount of data processed and transferred. -- **Pattern Matching:** Use wildcards to efficiently drop multiple columns that share a common naming pattern. +## Syntax -### Limitations -- The `DROP` command does not support nested fields. -- It cannot be used to drop columns of unsupported types as specified in the ES|QL limitations. +`DROP columns` + +### Parameters + +#### columns -### Examples +This is a list of columns, separated by commas, that you want to remove. Wildcards are supported. -#### Example 1: Dropping a Single Column -This example demonstrates how to drop a single column named `height` from the `employees` index. +## Examples + +In the following example, the 'height' column is removed from the data: ```esql FROM employees | DROP height ``` -#### Example 2: Dropping Multiple Columns Using Wildcards -This example shows how to use wildcards to drop all columns that start with `height`. +You can also use wildcards to remove all columns that match a certain pattern. In the following example, all columns that start with 'height' are removed: ```esql FROM employees | DROP height* ``` -#### Example 3: Dropping Multiple Specific Columns This example demonstrates how to drop multiple specific columns by listing them in a comma-separated format. ```esql @@ -37,7 +35,6 @@ FROM employees | DROP height, weight, age ``` -#### Example 4: Dropping Columns with Complex Patterns This example shows how to drop columns that match a more complex pattern using wildcards. ```esql @@ -45,7 +42,6 @@ FROM employees | DROP emp_* ``` -#### Example 5: Combining DROP with Other Commands This example demonstrates how to use the `DROP` command in conjunction with other commands like `KEEP` and `SORT`. ```esql @@ -55,4 +51,6 @@ FROM employees | SORT height DESC ``` -By using the `DROP` command, you can effectively manage the columns in your result set, making your ES|QL queries more efficient and easier to work with. \ No newline at end of file +### Limitations +- The `DROP` command does not support nested fields. +- It cannot be used to drop columns of unsupported types as specified in the ES|QL limitations. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-e.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-e.txt index 7f81d56ab63c2..2ab4a7e3449da 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-e.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-e.txt @@ -1,8 +1,16 @@ -## E +# E -The `E` function returns Euler’s number, which is a mathematical constant approximately equal to 2.71828. It is the base of the natural logarithm. +The E function returns Euler's number. -### Examples +## Syntax + +`E()` + +### Parameters + +This function does not require any parameters. + +## Examples ```esql ROW E() @@ -12,4 +20,4 @@ ROW E() FROM employees | EVAL euler_number = E() | KEEP euler_number -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ends_with.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ends_with.txt index 7607666f70213..0ceafe99f528d 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ends_with.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ends_with.txt @@ -1,8 +1,23 @@ -## ENDS_WITH +# ENDS_WITH -The `ENDS_WITH` function returns a boolean that indicates whether a keyword string ends with another string. +The ENDS_WITH function checks if a given string ends with a specified suffix. + +## Syntax + +`ENDS_WITH(str, suffix)` + +### Parameters + +#### str + +This is the string expression that you want to check. + +#### suffix + +The string expression that will be checked if it is the ending of the first string. + +## Examples -### Examples ```esql FROM employees @@ -14,4 +29,4 @@ FROM employees FROM employees | KEEP first_name | EVAL fn_E = ENDS_WITH(first_name, "a") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-enrich.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-enrich.txt index 0db6c10e0d44f..9587732048639 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-enrich.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-enrich.txt @@ -1,48 +1,60 @@ -## ENRICH +# ENRICH -ENRICH enables you to add data from existing indices as new columns using an enrich policy. This command is useful for enriching your dataset with additional information from other indices, which can be particularly beneficial for data analysis and reporting. Before using the ENRICH command, you need to create and execute an enrich policy. +The ENRICH command allows you to add data from existing indices as new columns using an enrich policy. -### Use Cases -- **Data Enrichment**: Add supplementary data to your existing dataset for more comprehensive analysis. -- **Cross-Cluster Enrichment**: Enrich data across multiple clusters using the `mode` parameter. -- **Custom Column Names**: Rename columns to avoid conflicts or for better readability. +## Syntax -### Limitations -- The ENRICH command only supports enrich policies of type `match`. -- ENRICH only supports enriching on a column of type `keyword`. +`ENRICH policy [ON match_field] [WITH [new_name1 = ]field1, [new_name2 = ]field2, ...]` + +### Parameters + +#### policy + +The name of the enrich policy. You need to create and execute the enrich policy first. + +#### match_field + +The match field. ENRICH uses its value to look for records in the enrich index. If not specified, the match will be performed on the column with the same name as the `match_field` defined in the enrich policy. -### Examples +#### new_nameX -#### Example 1: Basic Enrichment -The following example uses the `languages_policy` enrich policy to add a new column for each enrich field defined in the policy. The match is performed using the `match_field` defined in the enrich policy and requires that the input table has a column with the same name (`language_code` in this example). +Allows you to change the name of the column that’s added for each of the enrich fields. Defaults to the enrich field name. If a column has the same name as the new name, it will be discarded. If a name (new or original) occurs more than once, only the rightmost duplicate creates a new column. + +#### fieldX + +The enrich fields from the enrich index that are added to the result as new columns. If a column with the same name as the enrich field already exists, the existing column will be replaced by the new column. If not specified, each of the enrich fields defined in the policy is added. A column with the same name as the enrich field will be dropped unless the enrich field is renamed. + +## Examples + +The following example uses the `languages_policy` enrich policy to add a new column for each enrich field defined in the policy. The match is performed using the `match_field` defined in the enrich policy and requires that the input table has a column with the same name (`language_code` in this example). ENRICH will look for records in the enrich index based on the match field value. ```esql ROW language_code = "1" | ENRICH languages_policy ``` -#### Example 2: Using a Different Match Field -To use a column with a different name than the `match_field` defined in the policy as the match field, use the `ON` parameter. +To use a column with a different name than the `match_field` defined in the policy as the match field, use `ON `: ```esql ROW a = "1" | ENRICH languages_policy ON a ``` -#### Example 3: Selecting Specific Enrich Fields -By default, each of the enrich fields defined in the policy is added as a column. To explicitly select the enrich fields that are added, use the `WITH` parameter. +By default, each of the enrich fields defined in the policy is added as a column. To explicitly select the enrich fields that are added, use `WITH , , ...`: ```esql ROW a = "1" | ENRICH languages_policy ON a WITH language_name ``` -#### Example 4: Renaming Enrich Fields -You can rename the columns that are added using the `WITH new_name=` syntax. +You can rename the columns that are added using `WITH new_name=`: ```esql ROW a = "1" | ENRICH languages_policy ON a WITH name = language_name ``` -In case of name collisions, the newly created columns will override existing columns. \ No newline at end of file +### Limitations +- In case of name collisions, the newly created columns will override existing columns. +- The ENRICH command only supports enrich policies of type `match`. +- ENRICH only supports enriching on a column of type `keyword`. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-eval.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-eval.txt index a7ad446cbbde9..ee512ededc6c4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-eval.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-eval.txt @@ -1,20 +1,36 @@ -## EVAL +# EVAL -The `EVAL` processing command enables you to append new columns with calculated values. This command is useful for creating new data points derived from existing columns, such as performing arithmetic operations, applying functions, or using expressions. +The EVAL command allows you to append new columns with calculated values to your data. -### Use Cases -- **Data Transformation**: Create new columns based on existing data, such as converting units or calculating derived metrics. -- **Data Enrichment**: Add additional context to your data by computing new values. -- **Data Cleaning**: Standardize or normalize data by applying transformations. +## Syntax -### Limitations -- If a column with the same name already exists, the existing column is dropped. -- If a column name is used more than once, only the rightmost duplicate creates a column. +`EVAL [column1 =] value1[, ..., [columnN =] valueN]` + +### Parameters + +#### {columnX} + +This is the name of the column. If a column with the same name already exists, it will be replaced. If a column name is used more than once, only the rightmost duplicate will create a column. + +#### {valueX} -### Examples +This is the value for the column. It can be a literal, an expression, or a function. Columns defined to the left of this one can be used. -#### Example 1: Converting Height to Different Units -This example demonstrates how to convert the height from meters to feet and centimeters. +## Notes + +EVAL supports the following types of functions: +- Mathematical functions +- String functions +- Date-time functions +- Type conversation functions +- Conditional functions and expressions +- Multi-value functions + +Aggregation functions are NOT supported for EVAL. + +## Examples + +The following example multiplies the `height` column by 3.281 and 100 to create new columns `height_feet` and `height_cm`: ```esql FROM employees @@ -23,8 +39,7 @@ FROM employees | EVAL height_feet = height * 3.281, height_cm = height * 100 ``` -#### Example 2: Overwriting an Existing Column -In this example, the `height` column is overwritten with its value in feet. +If the specified column already exists, the existing column will be replaced, and the new column will be appended to the table: ```esql FROM employees @@ -33,8 +48,7 @@ FROM employees | EVAL height = height * 3.281 ``` -#### Example 3: Using an Expression as Column Name -Here, a new column is created with a name equal to the expression used to calculate its value. +Specifying the output column name is optional. If not specified, the new column name is equal to the expression. The following query adds a column named `height*3.281`: ```esql FROM employees @@ -43,8 +57,7 @@ FROM employees | EVAL height * 3.281 ``` -#### Example 4: Using Special Characters in Column Names -This example shows how to handle special characters in column names by quoting them with backticks. +Because this name contains special characters, it needs to be quoted with backticks (`) when using it in subsequent commands: ```esql FROM employees @@ -52,4 +65,6 @@ FROM employees | STATS avg_height_feet = AVG(`height * 3.281`) ``` -These examples illustrate the versatility of the `EVAL` command in transforming and enriching your data within Elasticsearch. \ No newline at end of file +### Limitations +- If a column with the same name already exists, the existing column is dropped. +- If a column name is used more than once, only the rightmost duplicate creates a column. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-exp.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-exp.txt index 89a2c612b08b7..0f55dc85702e5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-exp.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-exp.txt @@ -1,8 +1,19 @@ -## EXP +# EXP -The `EXP` function returns the value of Euler's number (e) raised to the power of the given numeric expression. If the input is null, the function returns null. +The EXP function calculates the value of Euler's number (e) raised to the power of a given number. + +## Syntax + +`EXP(number)` + +### Parameters + +#### number + +A numeric expression. If the parameter is `null`, the function will also return `null`. + +## Examples -### Examples ```esql ROW d = 5.0 @@ -12,4 +23,4 @@ ROW d = 5.0 ```esql ROW value = 2.0 | EVAL result = EXP(value) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-floor.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-floor.txt index d3f50c55d0091..eac92ffc434b5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-floor.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-floor.txt @@ -1,16 +1,30 @@ -## FLOOR +# FLOOR -The `FLOOR` function rounds a number down to the nearest integer. This operation is a no-op for long (including unsigned) and integer types. For double types, it picks the closest double value to the integer, similar to `Math.floor`. +The FLOOR function rounds a number down to the nearest integer. -### Examples +## Syntax + +`FLOOR(number)` + +### Parameters + +#### number + +This is a numeric expression. If the parameter is `null`, the function will return `null`. + +## Examples ```esql ROW a=1.8 -| EVAL a = FLOOR(a) +| EVAL a=FLOOR(a) ``` ```esql FROM employees | KEEP first_name, last_name, height | EVAL height_floor = FLOOR(height) -``` \ No newline at end of file +``` + +## Notes + +- The FLOOR function is a no-operation for `long` (including unsigned) and `integer` types. For `double` type, this function picks the closest `double` value to the integer, similar to the Math.floor method in programming languages. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-from.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-from.txt index 7847e7c847655..2f3618dce2412 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-from.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-from.txt @@ -1,21 +1,24 @@ -## FROM +# FROM -The `FROM` source command returns a table with data from a data stream, index, or alias. Each row in the resulting table represents a document, and each column corresponds to a field that can be accessed by the name of that field. This command is fundamental for querying data in Elasticsearch using ES|QL. +The `FROM` command retrieves a table of data from a specified data stream, index, or alias. -### Use Cases +## Syntax -- **Basic Data Retrieval**: Fetch data from a specific index or data stream. -- **Time Series Data**: Use date math to access indices relevant to specific time periods. -- **Multiple Indices**: Query multiple data streams, indices, or aliases using comma-separated lists or wildcards. -- **Remote Clusters**: Query data streams and indices on remote clusters. -- **Metadata Retrieval**: Retrieve specific metadata fields using the `METADATA` directive. +`FROM index_pattern [METADATA fields]` -### Limitations +### Parameters -- By default, an ES|QL query without an explicit `LIMIT` uses an implicit limit of 1000 rows. This applies to the `FROM` command as well. -- Queries do not return more than 10,000 rows, regardless of the `LIMIT` command’s value. +#### index_pattern + +This parameter represents a list of indices, data streams, or aliases. It supports the use of wildcards and date math. + +#### fields + +This is a comma-separated list of metadata fields to be retrieved. + +## Description -### Examples +The `FROM` command retrieves a table of data from a specified data stream, index, or alias. Each row in the resulting table represents a document, and each column corresponds to a field. The field can be accessed using its name. #### Basic Data Retrieval ```esql @@ -50,4 +53,9 @@ FROM employees METADATA _id Use enclosing double quotes (") or three enclosing double quotes (""") to escape index names that contain special characters: ```esql FROM "this=that","""this[that""" -``` \ No newline at end of file +``` + +### Limitations + +- By default, an ES|QL query without an explicit `LIMIT` uses an implicit limit of 1000 rows. This applies to the `FROM` command as well. +- Queries do not return more than 10,000 rows, regardless of the `LIMIT` command’s value. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-greatest.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-greatest.txt index 17217e8e84682..feb119185c72b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-greatest.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-greatest.txt @@ -1,8 +1,22 @@ -## GREATEST +# GREATEST -The `GREATEST` function returns the maximum value from multiple columns. This is similar to `MV_MAX` except it is intended to run on multiple columns at once. When run on keyword or text fields, this function returns the last string in alphabetical order. When run on boolean columns, it will return `true` if any values are `true`. +The GREATEST function returns the maximum value from multiple columns. -### Examples +## Syntax + +`GREATEST(first, rest)` + +### Parameters + +#### first + +The first column to evaluate. + +#### rest + +The remaining columns to evaluate. + +## Examples ```esql ROW a = 10, b = 20 @@ -12,4 +26,9 @@ ROW a = 10, b = 20 ```esql ROW x = "apple", y = "banana", z = "cherry" | EVAL max_fruit = GREATEST(x, y, z) -``` \ No newline at end of file +``` + +## Notes + +- When applied to `keyword` or `text` fields, the GREATEST function returns the last string in alphabetical order. +- When applied to `boolean` columns, it returns `true` if any values are `true`. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-grok.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-grok.txt index cc357b986a58b..2f7fa48df693f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-grok.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-grok.txt @@ -1,22 +1,24 @@ -## GROK +# GROK -GROK enables you to extract structured data out of a string. It matches the string against patterns based on regular expressions and extracts the specified patterns as columns. This command is useful for parsing logs, extracting fields from text, and structuring unstructured data. +The GROK command is used to extract structured data from a string. It matches the string against patterns based on regular expressions and extracts the specified patterns as columns. -### Use Cases -- **Log Parsing**: Extracting timestamps, IP addresses, and other fields from log entries. -- **Data Structuring**: Converting unstructured text data into structured columns. -- **Field Extraction**: Extracting specific fields from a string for further analysis. +## Syntax -### Limitations -- If a field name conflicts with an existing column, the existing column is discarded. -- If a field name is used more than once, a multi-valued column will be created with one value per each occurrence of the field name. -- The `GROK` command does not support configuring custom patterns or multiple patterns. -- The `GROK` command is not subject to Grok watchdog settings. +`GROK input "pattern"` + +### Parameters + +#### input + +The column containing the string you want to structure. If the column has multiple values, GROK will process each value. -### Examples +#### pattern -#### Example 1: Basic GROK Usage -This example parses a string that contains a timestamp, an IP address, an email address, and a number. +A grok pattern. If a field name conflicts with an existing column, the existing column is dropped. If a field name is used more than once, a multi-valued column is created with one value per each occurrence of the field name. + +## Examples + +The following example parses a string that contains a timestamp, an IP address, an email address, and a number: ```esql ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" @@ -24,8 +26,7 @@ ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" | KEEP date, ip, email, num ``` -#### Example 2: Type Conversion with GROK -By default, GROK outputs keyword string columns. To convert to other types, append `:type` to the semantics in the pattern. +By default, GROK outputs keyword string columns. `int` and `float` types can be converted by appending `:type` to the semantics in the pattern. For example `{NUMBER:num:int}`: ```esql ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" @@ -33,8 +34,7 @@ ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" | KEEP date, ip, email, num ``` -#### Example 3: Using Type Conversion Functions -For other type conversions, use Type conversion functions. +For other type conversions, use Type conversion functions: ```esql ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" @@ -43,8 +43,7 @@ ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42" | EVAL date = TO_DATETIME(date) ``` -#### Example 4: Handling Multi-Valued Columns -If a field name is used more than once, GROK creates a multi-valued column. +If a field name is used more than once, GROK creates a multi-valued column: ```esql FROM addresses @@ -52,4 +51,9 @@ FROM addresses | GROK zip_code "%{WORD:zip_parts} %{WORD:zip_parts}" ``` -These examples showcase different usages of the GROK command, from basic extraction to handling type conversions and multi-valued columns. \ No newline at end of file +### Limitations + +- If a field name conflicts with an existing column, the existing column is discarded. +- If a field name is used more than once, a multi-valued column will be created with one value per each occurrence of the field name. +- The `GROK` command does not support configuring custom patterns or multiple patterns. +- The `GROK` command is not subject to Grok watchdog settings. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ip_prefix.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ip_prefix.txt index 65d4ccbf5d4b3..e06773023ebdd 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ip_prefix.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ip_prefix.txt @@ -1,8 +1,26 @@ -## IP_PREFIX +# IP_PREFIX -The `IP_PREFIX` function truncates an IP address to a given prefix length. It supports both IPv4 and IPv6 addresses. +The IP_PREFIX function truncates an IP address to a specified prefix length. -### Examples +## Syntax + +`IP_PREFIX(ip, prefixLengthV4, prefixLengthV6)` + +### Parameters + +#### ip + +The IP address that you want to truncate. This function supports both IPv4 and IPv6 addresses. + +#### prefixLengthV4 + +The prefix length for IPv4 addresses. + +#### prefixLengthV6 + +The prefix length for IPv6 addresses. + +## Examples ```esql ROW ip4 = TO_IP("1.2.3.4"), ip6 = TO_IP("fe80::cae2:65ff:fece:feb9") @@ -13,4 +31,4 @@ ROW ip4 = TO_IP("1.2.3.4"), ip6 = TO_IP("fe80::cae2:65ff:fece:feb9") FROM network_logs | EVAL truncated_ip = IP_PREFIX(ip_address, 16, 0) | KEEP ip_address, truncated_ip -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-keep.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-keep.txt index fbf2466d26c6e..84d8207bdf934 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-keep.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-keep.txt @@ -1,17 +1,32 @@ -## KEEP +# KEEP -The `KEEP` processing command in ES|QL enables you to specify which columns are returned and the order in which they are returned. This command is particularly useful when you want to focus on specific fields in your dataset, either by explicitly naming them or by using wildcard patterns. The `KEEP` command supports a variety of use cases, such as filtering out unnecessary columns, reordering columns for better readability, and ensuring that only relevant data is processed in subsequent commands. +The KEEP command allows you to specify which columns to return and in what order. -### Use Cases -- **Selective Column Retrieval**: Retrieve only the columns you need for analysis, reducing the amount of data processed. -- **Column Reordering**: Specify the order in which columns should appear in the result set. -- **Wildcard Support**: Use wildcards to include multiple columns that match a pattern, simplifying queries when dealing with numerous fields. +## Syntax -### Limitations -- **Precedence Rules**: When a field name matches multiple expressions, precedence rules are applied. Complete field names take the highest precedence, followed by partial wildcard expressions, and finally, the wildcard `*`. -- **Column Conflicts**: If a field matches two expressions with the same precedence, the rightmost expression wins. +`KEEP columns` -### Examples +### Parameters + +#### columns + +A comma-separated list of columns to retain. Wildcards are supported. If an existing column matches multiple provided wildcards or column names, certain rules apply. + +## Note + +The KEEP command is used to specify which columns to return and their order. + +When a field name matches multiple expressions, precedence rules are applied. Fields are added in the order they appear. If one field matches multiple expressions, the following precedence rules apply (from highest to lowest priority): + +1. Complete field name (no wildcards) +2. Partial wildcard expressions (for example: `fieldNam*`) +3. Wildcard only (`*`) + +If a field matches two expressions with the same precedence, the rightmost expression wins. + +Important: only the columns in the KEEP command can be used after a KEEP command. + +## Examples #### Example 1: Specifying Columns Explicitly This example demonstrates how to explicitly specify the columns to be returned. @@ -60,5 +75,3 @@ This example illustrates how the simple wildcard expression `*` has the lowest p FROM employees | KEEP *, first_name ``` - -These examples showcase the versatility and utility of the `KEEP` command in various scenarios, making it a powerful tool for data manipulation in ES|QL. \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-least.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-least.txt index f756820f7840d..7e0f77bc911eb 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-least.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-least.txt @@ -1,8 +1,22 @@ -## LEAST +# LEAST -Returns the minimum value from multiple columns. This is similar to `MV_MIN` except it is intended to run on multiple columns at once. +The LEAST function returns the smallest value from multiple columns. -### Examples +## Syntax + +`LEAST(first, rest)` + +### Parameters + +#### first + +The first column to evaluate. + +#### rest + +The remaining columns to evaluate. + +## Examples ```esql ROW a = 10, b = 20 @@ -12,4 +26,4 @@ ROW a = 10, b = 20 ```esql ROW x = 5, y = 15, z = 10 | EVAL min_value = LEAST(x, y, z) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-left.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-left.txt index 5164a100ea22b..74997f638f463 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-left.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-left.txt @@ -1,8 +1,24 @@ -## LEFT +# LEFT -The `LEFT` function returns the substring that extracts a specified number of characters from a string, starting from the left. +The LEFT function returns a substring from the beginning of a specified string. -### Examples +## Syntax + +`LEFT(string, length)` + +### Parameters + +#### string + +The string from which a substring will be extracted. + +#### length + +The number of characters to extract from the string. + +## Examples + +The following example extracts the first three characters from the `last_name` field: ```esql FROM employees @@ -16,4 +32,4 @@ FROM employees ROW full_name = "John Doe" | EVAL first_name = LEFT(full_name, 4) | KEEP first_name -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-length.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-length.txt index ea692e7fe9eae..996464cf42ba1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-length.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-length.txt @@ -1,8 +1,20 @@ -## LENGTH +# LENGTH -The `LENGTH` function returns the character length of a string. If the input string is null, the function returns null. +The LENGTH function calculates the character length of a given string. -### Examples +## Syntax + +`LENGTH(string)` + +### Parameters + +#### string + +The string expression for which the length is to be calculated. + +## Examples + +The following example calculates the character length of the `first_name` field: ```esql FROM employees @@ -13,4 +25,4 @@ FROM employees ```esql ROW message = "Hello, World!" | EVAL message_length = LENGTH(message) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-limit.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-limit.txt index da1a0f85a8782..1a77939b4afbd 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-limit.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-limit.txt @@ -1,24 +1,19 @@ -## LIMIT +# LIMIT -The `LIMIT` processing command in ES|QL is used to restrict the number of rows returned by a query. This is particularly useful when you want to control the volume of data retrieved, either for performance reasons or to focus on a specific subset of the data. +The LIMIT command is used to restrict the number of rows returned by a query. -### Use Cases -- **Performance Optimization**: By limiting the number of rows returned, you can improve query performance and reduce the load on the Elasticsearch cluster. -- **Data Sampling**: Useful for retrieving a sample of data for analysis or debugging. -- **Pagination**: Helps in implementing pagination by limiting the number of rows per page. +## Syntax -### Limitations -- **Maximum Rows**: Queries do not return more than 10,000 rows, regardless of the `LIMIT` command’s value. This limit only applies to the number of rows that are retrieved by the query. Queries and aggregations run on the full data set. -- **Overcoming Limitations**: To overcome this limitation, you can: - - Reduce the result set size by modifying the query to only return relevant data using the `WHERE` command. - - Shift any post-query processing to the query itself using the `STATS ... BY` command to aggregate data in the query. -- **Dynamic Cluster Settings**: The default and maximum limits can be changed using these dynamic cluster settings: - - `esql.query.result_truncation_default_size` - - `esql.query.result_truncation_max_size` +`LIMIT max_number_of_rows` -### Examples +### Parameters + +#### max_number_of_rows + +This parameter specifies the maximum number of rows to be returned. + +## Examples -#### Example 1: Basic Usage This example demonstrates how to limit the number of rows returned to 5. ```esql @@ -27,8 +22,7 @@ FROM employees | LIMIT 5 ``` -#### Example 2: Limiting Rows After Filtering -This example shows how to limit the number of rows after applying a filter. +This example shows how to limit the number of rows after applying a filter: ```esql FROM employees @@ -36,8 +30,7 @@ FROM employees | LIMIT 10 ``` -#### Example 3: Limiting Rows with Aggregation -This example demonstrates limiting the number of rows after performing an aggregation. +This example demonstrates limiting the number of rows after performing an aggregation: ```esql FROM employees @@ -45,8 +38,7 @@ FROM employees | LIMIT 3 ``` -#### Example 4: Limiting Rows with Sorting -This example shows how to limit the number of rows after sorting the data. +This example shows how to limit the number of rows after sorting the data: ```esql FROM employees @@ -54,8 +46,7 @@ FROM employees | LIMIT 7 ``` -#### Example 5: Limiting Rows with Multiple Commands -This example demonstrates the use of `LIMIT` in conjunction with multiple other commands. +This example demonstrates the use of `LIMIT` in conjunction with multiple other commands: ```esql FROM employees @@ -65,4 +56,20 @@ FROM employees | LIMIT 5 ``` -By using the `LIMIT` command, you can effectively manage the volume of data returned by your ES|QL queries, ensuring better performance and more focused results. \ No newline at end of file +## Limitations + +There is no way to achieve pagination with LIMIT, there is no offset parameter. + +A query will never return more than 10,000 rows. This limitation only applies to the number of rows retrieved by the query. The query and any aggregations will still run on the full dataset. + +To work around this limitation: + +- Reduce the size of the result set by modifying the query to only return relevant data. This can be achieved by using the WHERE command to select a smaller subset of the data. +- Shift any post-query processing to the query itself. The ES|QL STATS ... BY command can be used to aggregate data within the query. + +## Notes + +The default and maximum limits can be adjusted using the following dynamic cluster settings: + +- `esql.query.result_truncation_default_size` +- `esql.query.result_truncation_max_size` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-locate.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-locate.txt index e62ea05fcc3ab..1dafd3fa8c998 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-locate.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-locate.txt @@ -1,18 +1,26 @@ -## LOCATE +# LOCATE -The `LOCATE` function returns an integer that indicates the position of a keyword substring within another string. +The LOCATE function returns the position of a specified substring within a string. -### Syntax +## Syntax `LOCATE(string, substring, start)` ### Parameters -- `string`: An input string. -- `substring`: A substring to locate in the input string. -- `start`: The start index. +#### string -### Examples +The string in which you want to search for the substring. + +#### substring + +The substring you want to find in the string. + +#### start + +The starting index for the search. + +## Examples ```esql ROW a = "hello" @@ -22,4 +30,9 @@ ROW a = "hello" ```esql ROW phrase = "Elasticsearch is powerful" | EVAL position = LOCATE(phrase, "powerful") -``` \ No newline at end of file +``` + +## Notes + +- String positions start from `1`. +- If the substring cannot be found, the function returns `0`. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-log.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-log.txt index b41fef3adc86d..0c476551c02d4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-log.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-log.txt @@ -1,8 +1,22 @@ -## LOG +# LOG -The `LOG` function returns the logarithm of a value to a specified base. The input can be any numeric value, and the return value is always a double. Logs of zero, negative numbers, and base of one return null as well as a warning. +The LOG function calculates the logarithm of a given value to a specified base. -### Examples +## Syntax + +`LOG(base, number)` + +### Parameters + +#### base + +The base of the logarithm. If the base is `null`, the function will return `null`. If the base is not provided, the function will return the natural logarithm (base e) of the value. + +#### number + +The numeric value for which the logarithm is to be calculated. If the number is `null`, the function will return `null`. + +## Examples ```esql ROW base = 2.0, value = 8.0 @@ -12,4 +26,4 @@ ROW base = 2.0, value = 8.0 ```esql ROW value = 100 | EVAL s = LOG(value) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-lookup.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-lookup.txt index fc9312674db81..d6923ba0bb25f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-lookup.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-lookup.txt @@ -1,10 +1,22 @@ -## LOOKUP +# LOOKUP -The `LOOKUP` command in ES|QL is highly experimental and only available in SNAPSHOT versions. It matches values from the input against a table provided in the request, adding the other fields from the table to the output. This command is useful for enriching your dataset with additional information from a predefined table. However, it is important to note that if the table’s column names conflict with existing columns, the existing columns will be dropped. +The LOOKUP command is a highly experimental feature currently only available in SNAPSHOT versions. It matches values from the input against a provided table, appending the other fields from the table to the output. -### Examples +## Syntax -Here are some example ES|QL queries using the `LOOKUP` command: +`LOOKUP table ON match_field1[, match_field2, ...]` + +### Parameters + +#### table + +The name of the table provided in the request to match against. If the table’s column names conflict with existing columns, the existing columns will be dropped. + +#### match_field + +The fields in the input to match against the table. + +## Examples 1. **Basic Lookup Example:** ```esql @@ -98,4 +110,4 @@ A Fire Upon the Deep|Vernor Vinge |Diamond Dune |Frank Herbert |The New Wave Revelation Space |Alastair Reynolds|Diamond Leviathan Wakes |James S.A. Corey |Hadron -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ltrim.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ltrim.txt index 7a34fe57f9801..29e266a197b32 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ltrim.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-ltrim.txt @@ -1,8 +1,18 @@ -## LTRIM +# LTRIM -Removes leading whitespaces from a string. +The LTRIM function is used to remove leading whitespaces from a string. -### Examples +## Syntax + +`LTRIM(string)` + +### Parameters + +#### string + +This is the string expression from which you want to remove leading whitespaces. If the string is `null`, the function will return `null`. + +## Examples ```esql ROW message = " some text ", color = " red " @@ -16,4 +26,4 @@ ROW message = " some text ", color = " red " ROW text = " example text " | EVAL trimmed_text = LTRIM(text) | EVAL formatted_text = CONCAT("Trimmed: '", trimmed_text, "'") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-max.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-max.txt index 381c66afa9bb1..8f30ac8ac94c8 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-max.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-max.txt @@ -1,15 +1,29 @@ -## MAX +# MAX -The `MAX` function returns the maximum value of a specified field. +The MAX function calculates the maximum value of a specified field. -### Examples +## Syntax + +`MAX(field)` + +### Parameters + +#### field + +The field for which the maximum value is to be calculated. + +## Examples + +Calculate the maximum number of languages known by employees: ```esql FROM employees | STATS MAX(languages) ``` +The MAX function can be used with inline functions: + ```esql FROM employees | STATS max_avg_salary_change = MAX(MV_AVG(salary_change)) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median.txt index 5da7a9be4fdb3..0e7b1900bd003 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median.txt @@ -1,15 +1,33 @@ -## MEDIAN +# MEDIAN -The `MEDIAN` function returns the value that is greater than half of all values and less than half of all values, also known as the 50% PERCENTILE. Like `PERCENTILE`, `MEDIAN` is usually approximate. It is also non-deterministic, meaning you can get slightly different results using the same data. +The MEDIAN function calculates the median value of a numeric field. The median is the value that is greater than half of all values and less than half of all values, also known as the 50% percentile. -### Examples +## Syntax + +`MEDIAN(number)` + +### Parameters + +#### number + +The numeric field for which the median is calculated. + +## Examples + +Calculate the median salary: ```esql FROM employees -| STATS MEDIAN(salary), PERCENTILE(salary, 50) +| STATS MEDIAN(salary) ``` +Calculate the median of the maximum values of a multivalued column: + ```esql FROM employees | STATS median_max_salary_change = MEDIAN(MV_MAX(salary_change)) -``` \ No newline at end of file +``` + +## Limitations + +- The MEDIAN function is usually approximate and non-deterministic. This means you can get slightly different results using the same data. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median_absolute_deviation.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median_absolute_deviation.txt index 07da947d5494c..6bd3de7db5cf6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median_absolute_deviation.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-median_absolute_deviation.txt @@ -1,15 +1,34 @@ -## MEDIAN_ABSOLUTE_DEVIATION +# MEDIAN_ABSOLUTE_DEVIATION -The `MEDIAN_ABSOLUTE_DEVIATION` function returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data, it can be more descriptive than standard deviation. It is calculated as the median of each data point’s deviation from the median of the entire sample. That is, for a random variable X, the median absolute deviation is median(|median(X) - X|). Like `PERCENTILE`, `MEDIAN_ABSOLUTE_DEVIATION` is usually approximate. +The MEDIAN_ABSOLUTE_DEVIATION function calculates the median absolute deviation, a measure of variability. It is particularly useful for describing data that may have outliers or may not follow a normal distribution. In such cases, it can be more descriptive than standard deviation. The function computes the median of each data point’s deviation from the median of the entire sample. -### Examples +## Syntax + +`MEDIAN_ABSOLUTE_DEVIATION(number)` + +### Parameters + +#### number + +The numeric expression for which the median absolute deviation is to be calculated. + +## Examples + +Calculate the median salary and the median absolute deviation of salaries: ```esql FROM employees | STATS MEDIAN(salary), MEDIAN_ABSOLUTE_DEVIATION(salary) ``` +Calculate the median absolute deviation of the maximum values of a multivalued column: + ```esql FROM employees | STATS m_a_d_max_salary_change = MEDIAN_ABSOLUTE_DEVIATION(MV_MAX(salary_change)) -``` \ No newline at end of file +``` + +## Limitations + +- The `MEDIAN_ABSOLUTE_DEVIATION` function is non-deterministic, which means you can get slightly different results using the same data. +- The `MEDIAN_ABSOLUTE_DEVIATION` function is usually approximate, which means the results may not be exact. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-min.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-min.txt index 043ad01280ad8..6b4848c7fc9a7 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-min.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-min.txt @@ -1,15 +1,29 @@ -## MIN +# MIN -The `MIN` function returns the minimum value of a specified field. +The MIN function calculates the minimum value of a specified field. -### Examples +## Syntax + +`MIN(field)` + +### Parameters + +#### field + +The field for which the minimum value is to be calculated. + +## Examples + +Calculate the minimum number of languages spoken by employees: ```esql FROM employees | STATS MIN(languages) ``` +The MIN function can be used with inline functions: + ```esql FROM employees | STATS min_avg_salary_change = MIN(MV_AVG(salary_change)) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_append.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_append.txt index 9926157ce96c4..46196cf32931b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_append.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_append.txt @@ -1,8 +1,16 @@ -## MV_APPEND +# MV_APPEND -The `MV_APPEND` function concatenates values of two multi-value fields. +MV_APPEND is a function that concatenates the values of two multi-value fields. -### Examples +## Syntax + +`MV_APPEND(field1, field2)` + +### Parameters + +#### field1 + +The first multi-value field to be concatenated. ```esql ROW a = ["foo", "bar"], b = ["baz", "qux"] @@ -14,4 +22,4 @@ ROW a = ["foo", "bar"], b = ["baz", "qux"] ROW x = [1, 2, 3], y = [4, 5, 6] | EVAL z = MV_APPEND(x, y) | KEEP x, y, z -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_avg.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_avg.txt index 431c4ec6b2891..81d9eb231311b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_avg.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_avg.txt @@ -1,8 +1,18 @@ -## MV_AVG +# MV_AVG -The `MV_AVG` function converts a multivalued field into a single-valued field containing the average of all the values. +The MV_AVG function calculates the average of all values in a multivalued field and returns a single value. -### Examples +## Syntax + +`MV_AVG(number)` + +### Parameters + +#### number + +A multivalued expression. + +## Examples ```esql ROW a=[3, 5, 1, 6] @@ -12,4 +22,4 @@ ROW a=[3, 5, 1, 6] ```esql ROW scores=[10, 20, 30, 40] | EVAL average_score = MV_AVG(scores) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_concat.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_concat.txt index 32c029703257d..7a4d9fff9466a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_concat.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_concat.txt @@ -1,15 +1,33 @@ -## MV_CONCAT +# MV_CONCAT -Converts a multivalued string expression into a single valued column containing the concatenation of all values separated by a delimiter. +MV_CONCAT is a function that transforms a multivalued string expression into a single valued column. It concatenates all values and separates them with a specified delimiter. -### Examples +## Syntax + +`MV_CONCAT(string, delim)` + +### Parameters + +#### string + +A multivalue expression. + +#### delim + +This is the delimiter that separates the concatenated values. + +## Examples + +The following example concatenates the values in the array ["foo", "zoo", "bar"] with a comma and a space as the delimiter: ```esql ROW a=["foo", "zoo", "bar"] | EVAL j = MV_CONCAT(a, ", ") ``` +If you want to concatenate non-string columns, you need to convert them to strings first using the `TO_STRING` function: + ```esql ROW a=[10, 9, 8] | EVAL j = MV_CONCAT(TO_STRING(a), ", ") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_count.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_count.txt index a8f8d0c5149ad..808563d91b3bf 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_count.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_count.txt @@ -1,8 +1,18 @@ -## MV_COUNT +# MV_COUNT -The `MV_COUNT` function converts a multivalued expression into a single-valued column containing a count of the number of values. +The MV_COUNT function calculates the total number of values in a multivalued expression. -### Examples +## Syntax + +`MV_COUNT(field)` + +### Parameters + +#### field + +A multivalued expression. + +## Examples ```esql ROW a=["foo", "zoo", "bar"] @@ -12,4 +22,4 @@ ROW a=["foo", "zoo", "bar"] ```esql ROW b=["apple", "banana", "cherry", "date"] | EVAL count_b = MV_COUNT(b) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_dedupe.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_dedupe.txt index 297179f995dff..644ddd6d5f405 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_dedupe.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_dedupe.txt @@ -1,8 +1,18 @@ -## MV_DEDUPE +# MV_DEDUPE -Removes duplicate values from a multivalued field. `MV_DEDUPE` may, but won’t always, sort the values in the column. +The MV_DEDUPE function is used to eliminate duplicate values from a multivalued field. -### Examples +## Syntax + +`MV_DEDUPE(field)` + +### Parameters + +#### field + +This is a multivalue expression. + +## Examples ```esql ROW a=["foo", "foo", "bar", "foo"] @@ -12,4 +22,8 @@ ROW a=["foo", "foo", "bar", "foo"] ```esql ROW b=["apple", "apple", "banana", "apple", "banana"] | EVAL dedupe_b = MV_DEDUPE(b) -``` \ No newline at end of file +``` + +## Notes + +While MV_DEDUPE may sort the values in the column, it's not guaranteed to always do so. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_expand.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_expand.txt index 76528b5e22654..3248391d3d658 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_expand.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_expand.txt @@ -1,27 +1,27 @@ -## MV_EXPAND +# MV_EXPAND -The `MV_EXPAND` processing command expands multivalued columns into one row per value, duplicating other columns. This command is useful when you need to normalize data that contains multivalued fields, making it easier to perform operations on each individual value. +The MV_EXPAND command is used to expand multivalued columns into individual rows, replicating the other columns for each new row. -### Use Cases -- **Normalization**: Transform multivalued fields into single-valued rows for easier analysis and processing. -- **Data Transformation**: Prepare data for further operations like sorting, filtering, or aggregating by expanding multivalued fields. -- **Data Cleaning**: Simplify complex data structures by breaking down multivalued fields into individual rows. +## Syntax -### Limitations -- This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. +`MV_EXPAND column` -### Examples +### Parameters -#### Example 1: Basic Expansion -Expanding a multivalued column `a` into individual rows. +#### column + +This is the multivalued column that you want to expand. + +## Examples + +Expanding a multivalued column `a` into individual rows: ```esql ROW a=[1,2,3], b="b", j=["a","b"] | MV_EXPAND a ``` -#### Example 2: Expanding Multiple Columns -Expanding two multivalued columns `a` and `j` into individual rows. +Expanding two multivalued columns `a` and `j` into individual rows: ```esql ROW a=[1,2,3], b="b", j=["a","b"] @@ -29,8 +29,7 @@ ROW a=[1,2,3], b="b", j=["a","b"] | MV_EXPAND j ``` -#### Example 3: Combining with Other Commands -Expanding a multivalued column and then filtering the results. +Expanding a multivalued column and then filtering the results: ```esql ROW a=[1,2,3,4,5], b="b" @@ -38,4 +37,6 @@ ROW a=[1,2,3,4,5], b="b" | WHERE a > 2 ``` -These examples demonstrate different ways to use the `MV_EXPAND` command to transform and analyze data with multivalued fields. \ No newline at end of file +## Notes + +This feature is currently in technical preview and may be subject to changes or removal in future releases. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_first.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_first.txt index 1969ad30226ac..7b04ce040c7b0 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_first.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_first.txt @@ -1,8 +1,18 @@ -## MV_FIRST +# MV_FIRST -The `MV_FIRST` function converts a multivalued expression into a single-valued column containing the first value. This is most useful when reading from a function that emits multivalued columns in a known order like `SPLIT`. The order that multivalued fields are read from underlying storage is not guaranteed. It is frequently ascending, but don’t rely on that. If you need the minimum value, use `MV_MIN` instead of `MV_FIRST`. `MV_MIN` has optimizations for sorted values so there isn’t a performance benefit to `MV_FIRST`. +The MV_FIRST function converts a multivalued expression into a single valued column containing the first value. -### Examples +## Syntax + +`MV_FIRST(field)` + +### Parameters + +#### field + +A multivalue expression. + +## Examples ```esql ROW a="foo;bar;baz" @@ -12,4 +22,8 @@ ROW a="foo;bar;baz" ```esql ROW b="apple;banana;cherry" | EVAL first_b = MV_FIRST(SPLIT(b, ";")) -``` \ No newline at end of file +``` + +## Notes + +The MV_FIRST function is particularly useful when reading from a function that emits multivalued columns in a known order, such as SPLIT. However, it's important to note that the order in which multivalued fields are read from underlying storage is not guaranteed. While it's often ascending, this should not be relied upon. If you need the minimum value, use the MV_MIN function instead of MV_FIRST. MV_MIN has optimizations for sorted values, so there isn't a performance benefit to MV_FIRST. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_last.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_last.txt index f6331ab55a7eb..2a9efa61ea0d6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_last.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_last.txt @@ -1,8 +1,20 @@ -## MV_LAST +# MV_LAST -The `MV_LAST` function converts a multivalue expression into a single valued column containing the last value. This is most useful when reading from a function that emits multivalued columns in a known order like `SPLIT`. The order that multivalued fields are read from underlying storage is not guaranteed. It is frequently ascending, but don’t rely on that. If you need the maximum value, use `MV_MAX` instead of `MV_LAST`. `MV_MAX` has optimizations for sorted values so there isn’t a performance benefit to `MV_LAST`. +The MV_LAST function converts a multivalued expression into a single valued column containing the last value. -### Examples +## Syntax + +`MV_LAST(field)` + +### Parameters + +#### field + +A multivalue expression. + + + +## Examples ```esql ROW a="foo;bar;baz" @@ -12,4 +24,8 @@ ROW a="foo;bar;baz" ```esql ROW a="apple;banana;cherry" | EVAL last_fruit = MV_LAST(SPLIT(a, ";")) -``` \ No newline at end of file +``` + +## Notes + +The MV_LAST function is particularly useful when reading from a function that emits multivalued columns in a known order, such as SPLIT. However, the order in which multivalued fields are read from underlying storage is not guaranteed. It is often ascending, but this should not be relied upon. If you need the maximum value, use the MV_MAX function instead of MV_LAST. MV_MAX has optimizations for sorted values, so there is no performance benefit to using MV_LAST. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_max.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_max.txt index 4c6d50ec151ee..03f894ce203a8 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_max.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_max.txt @@ -1,15 +1,29 @@ -## MV_MAX +# MV_MAX -The `MV_MAX` function converts a multivalued expression into a single valued column containing the maximum value. +MV_MAX function converts a multivalued expression into a single valued column containing the maximum value. -### Examples +## Syntax + +`MV_MAX(field)` + +### Parameters + +#### field + +A multivalue expression. + +## Examples + +The following example demonstrates the use of MV_MAX function: ```esql ROW a=[3, 5, 1] | EVAL max_a = MV_MAX(a) ``` +MV_MAX function can be used with any column type, including `keyword` columns. In such cases, it selects the last string, comparing their utf-8 representation byte by byte: + ```esql ROW a=["foo", "zoo", "bar"] | EVAL max_a = MV_MAX(a) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_median.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_median.txt index 6702441a82bca..013cce53deded 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_median.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_median.txt @@ -1,17 +1,27 @@ -## MV_MEDIAN +# MV_MEDIAN -The `MV_MEDIAN` function converts a multivalued field into a single valued field containing the median value. +The MV_MEDIAN function converts a multivalued field into a single valued field containing the median value. -### Examples +## Syntax + +`MV_MEDIAN(number)` + +### Parameters + +#### number + +A multivalue expression. + +## Examples ```esql ROW a=[3, 5, 1] | EVAL median_a = MV_MEDIAN(a) ``` -If the row has an even number of values for a column, the result will be the average of the middle two entries. If the column is not floating point, the average rounds down: +If the row has an even number of values for a column, the result will be the average of the middle two entries. If the column is not floating point, the average rounds **down**: ```esql ROW a=[3, 7, 1, 6] | EVAL median_a = MV_MEDIAN(a) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_min.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_min.txt index 386f5d424cef8..97cb8db004cda 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_min.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_min.txt @@ -1,8 +1,18 @@ -## MV_MIN +# MV_MIN -The `MV_MIN` function converts a multivalued expression into a single valued column containing the minimum value. +The MV_MIN function converts a multivalued expression into a single valued column containing the minimum value. -### Examples +## Syntax + +`MV_MIN(field)` + +### Parameters + +#### field + +This is a multivalue expression. + +## Examples ```esql ROW a=[2, 1] @@ -12,4 +22,4 @@ ROW a=[2, 1] ```esql ROW a=["foo", "bar"] | EVAL min_a = MV_MIN(a) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_pseries_weighted_sum.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_pseries_weighted_sum.txt index 1b1fc706b8d3d..85845e5fa8de1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_pseries_weighted_sum.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_pseries_weighted_sum.txt @@ -1,8 +1,22 @@ -## MV_PSERIES_WEIGHTED_SUM +# MV_PSERIES_WEIGHTED_SUM -Converts a multivalued expression into a single-valued column by multiplying every element on the input list by its corresponding term in P-Series and computing the sum. +The MV_PSERIES_WEIGHTED_SUM function transforms a multivalued expression into a single-valued column. It does this by multiplying each element in the input list by its corresponding term in a P-Series and then calculating the sum. -### Examples +## Syntax + +`MV_PSERIES_WEIGHTED_SUM(number, p)` + +### Parameters + +#### number + +This is a multivalue expression. + +#### p + +A number that represents the *p* parameter in the P-Series. It influences the contribution of each element to the weighted sum. + +## Examples ```esql ROW a = [70.0, 45.0, 21.0, 21.0, 21.0] @@ -14,4 +28,4 @@ ROW a = [70.0, 45.0, 21.0, 21.0, 21.0] ROW b = [10.0, 20.0, 30.0, 40.0, 50.0] | EVAL weighted_sum = MV_PSERIES_WEIGHTED_SUM(b, 2.0) | KEEP weighted_sum -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_slice.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_slice.txt index 4b93d9703095b..fccf790cfb79e 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_slice.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_slice.txt @@ -1,8 +1,26 @@ -## MV_SLICE +# MV_SLICE -The `MV_SLICE` function returns a subset of the multivalued field using the start and end index values. +The MV_SLICE function is used to extract a subset of a multivalued field using specified start and end index values. -### Examples +## Syntax + +`MV_SLICE(field, start, end)` + +### Parameters + +#### field + +This is a multivalue expression. If `null`, the function will return `null`. + +#### start + +This is the start position. If `null`, the function will return `null`. The start argument can be negative, where an index of -1 is used to specify the last value in the list. + +#### end + +This is the end position (included). This parameter is optional; if omitted, the position at `start` is returned. The end argument can be negative, where an index of -1 is used to specify the last value in the list. + +## Examples ```esql ROW a = [1, 2, 2, 3] @@ -12,4 +30,4 @@ ROW a = [1, 2, 2, 3] ```esql ROW a = [1, 2, 2, 3] | EVAL a1 = MV_SLICE(a, -2), a2 = MV_SLICE(a, -3, -1) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sort.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sort.txt index 14d41a8fd8d56..a2191a59214d8 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sort.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sort.txt @@ -1,15 +1,35 @@ -## MV_SORT +# MV_SORT -The `MV_SORT` function sorts a multivalued field in lexicographical order. The valid options for the sort order are `ASC` (ascending) and `DESC` (descending), with the default being `ASC`. +The MV_SORT function sorts a multivalued field in lexicographical order. -### Examples +## Syntax + +`MV_SORT(field, order)` + +### Parameters + +#### field + +This is a multivalue expression. If the value is `null`, the function will return `null`. + +#### order + +This parameter determines the sort order. The valid options are `ASC` and `DESC`. If not specified, the default is `ASC`. + +## Examples + +Without order parameter + +```esql +ROW names = ["Alice", "Bob", "Charlie"] +| EVAL sorted_names = mv_sort(names) +``` + +With order parameter ```esql ROW a = [4, 2, -3, 2] | EVAL sa = mv_sort(a), sd = mv_sort(a, "DESC") ``` -```esql -ROW names = ["Alice", "Bob", "Charlie"] -| EVAL sorted_names = mv_sort(names) -``` \ No newline at end of file + diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sum.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sum.txt index 8ee548edccc9c..7e09a7ceaff06 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sum.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_sum.txt @@ -1,8 +1,18 @@ -## MV_SUM +# MV_SUM -The `MV_SUM` function converts a multivalued field into a single valued field containing the sum of all of the values. +The MV_SUM function converts a multivalued field into a single valued field containing the sum of all the values. -### Examples +## Syntax + +`MV_SUM(number)` + +### Parameters + +#### number + +This is a multivalue expression. + +## Examples ```esql ROW a=[3, 5, 6] @@ -12,4 +22,4 @@ ROW a=[3, 5, 6] ```esql ROW numbers=[1, 2, 3, 4, 5] | EVAL total_sum = MV_SUM(numbers) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_zip.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_zip.txt index 953519b4bd3fe..c6349624e05af 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_zip.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-mv_zip.txt @@ -1,17 +1,37 @@ -## MV_ZIP +# MV_ZIP -The `MV_ZIP` function combines the values from two multivalued fields with a delimiter that joins them together. +The MV_ZIP function combines the values from two multivalued fields with a specified delimiter. -### Examples +## Syntax + +`MV_ZIP(string1, string2, delim)` + +### Parameters + +#### string1 + +A multivalue expression. + +#### string2 + +A multivalue expression. + +#### delim + +An optional parameter that specifies the delimiter used to join the values. If omitted, a comma (`,`) is used as the default delimiter. + +## Examples + +The following example demonstrates how to use the MV_ZIP function: ```esql ROW a = ["x", "y", "z"], b = ["1", "2"] -| EVAL c = mv_zip(a, b, "-") +| EVAL c = MV_ZIP(a, b, "-") | KEEP a, b, c ``` ```esql ROW names = ["Alice", "Bob", "Charlie"], ids = ["001", "002", "003"] -| EVAL combined = mv_zip(names, ids, ":") +| EVAL combined = MV_ZIP(names, ids, ":") | KEEP names, ids, combined -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-now.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-now.txt index 165bcfe7af1dd..15fd1e506a1d4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-now.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-now.txt @@ -1,8 +1,16 @@ -## NOW +# NOW -The `NOW` function returns the current date and time. +The NOW function returns the current date and time. -### Examples +## Syntax + +`NOW()` + +### Parameters + +This function does not require any parameters. + +## Examples ```esql ROW current_date = NOW() @@ -11,4 +19,4 @@ ROW current_date = NOW() ```esql FROM sample_data | WHERE @timestamp > NOW() - 1 hour -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-operators.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-operators.txt index cc6c7f5bdf348..0e79037636072 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-operators.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-operators.txt @@ -1,204 +1,224 @@ # ES|QL Operators +This document provides an overview of the operators supported by ES|QL. + ## Binary Operators -### Equality (`==`) -Check if two fields are equal. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Equality `==` + +The equality operator checks if the values of two operands are equal or not. + +Example: -#### Example: ```esql FROM employees -| WHERE first_name == "John" -| KEEP first_name, last_name +| WHERE emp_no == 10001 ``` -### Inequality (`!=`) -Check if two fields are unequal. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Inequality `!=` + +The inequality operator checks if the values of two operands are equal or not. + +Example: -#### Example: ```esql FROM employees -| WHERE first_name != "John" -| KEEP first_name, last_name +| WHERE emp_no != 10001 ``` -### Less than (`<`) -Check if one field is less than another. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Less Than `<` + +The less than operator checks if the value of the left operand is less than the value of the right operand. + +Example: -#### Example: ```esql FROM employees -| WHERE age < 30 -| KEEP first_name, last_name, age +| WHERE salary < 50000 ``` -### Less than or equal to (`<=`) -Check if one field is less than or equal to another. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Less Than or Equal To `<=` + +This operator checks if the value of the left operand is less than or equal to the value of the right operand. + +Example: -#### Example: ```esql FROM employees -| WHERE age <= 30 -| KEEP first_name, last_name, age +| WHERE salary <= 50000 ``` -### Greater than (`>`) -Check if one field is greater than another. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Greater Than `>` + +The greater than operator checks if the value of the left operand is greater than the value of the right operand. + +Example: -#### Example: ```esql FROM employees -| WHERE age > 30 -| KEEP first_name, last_name, age +| WHERE salary > 50000 ``` -### Greater than or equal to (`>=`) -Check if one field is greater than or equal to another. If either field is multivalued, the result is null. This is pushed to the underlying search index if one side of the comparison is constant and the other side is a field in the index that has both an index and doc_values. +### Greater Than or Equal To `>=` + +This operator checks if the value of the left operand is greater than or equal to the value of the right operand. + +Example: -#### Example: ```esql FROM employees -| WHERE age >= 30 -| KEEP first_name, last_name, age +| WHERE salary >= 50000 ``` -### Add (`+`) -Add two numbers together. If either field is multivalued, the result is null. +### Add `+` + +The add operator adds the values of the operands. + +Example: -#### Example: ```esql FROM employees -| EVAL total_salary = base_salary + bonus -| KEEP first_name, last_name, total_salary +| EVAL total_compensation = salary + bonus ``` -### Subtract (`-`) -Subtract one number from another. If either field is multivalued, the result is null. +### Subtract `-` + +The subtract operator subtracts the right-hand operand from the left-hand operand. + +Example: -#### Example: ```esql FROM employees -| EVAL net_salary = gross_salary - tax -| KEEP first_name, last_name, net_salary +| EVAL remaining_salary = salary - tax ``` -### Multiply (`*`) -Multiply two numbers together. If either field is multivalued, the result is null. +### Multiply `*` + +The multiply operator multiplies the values of the operands. + +Example: -#### Example: ```esql FROM employees -| EVAL annual_salary = monthly_salary * 12 -| KEEP first_name, last_name, annual_salary +| EVAL yearly_salary = salary * 12 ``` -### Divide (`/`) -Divide one number by another. If either field is multivalued, the result is null. Division of two integer types will yield an integer result, rounding towards 0. If you need floating point division, cast one of the arguments to a `DOUBLE`. +### Divide `/` + +The divide operator divides the left-hand operand by the right-hand operand. + +Example: -#### Example: ```esql FROM employees -| EVAL average_salary = total_salary / months_worked -| KEEP first_name, last_name, average_salary +| EVAL monthly_salary = salary / 12 ``` -### Modulus (`%`) -Divide one number by another and return the remainder. If either field is multivalued, the result is null. +### Modulus `%` + +The modulus operator returns the remainder of the division of the left operand by the right operand. + +Example: -#### Example: ```esql FROM employees -| EVAL remainder = total_days % 7 -| KEEP first_name, last_name, remainder +| EVAL remainder = salary % 12 ``` ## Unary Operators ### Negation (`-`) -The only unary operator is negation. -#### Example: +Example: + ```esql FROM employees | EVAL negative_salary = -salary -| KEEP first_name, last_name, negative_salary ``` ## Logical Operators ### AND + Logical AND operator. -#### Example: +Example: + ```esql FROM employees -| WHERE age > 30 AND department == "Engineering" -| KEEP first_name, last_name, age, department +| WHERE salary > 50000 AND bonus > 10000 ``` ### OR + Logical OR operator. -#### Example: +Example: + ```esql FROM employees -| WHERE age > 30 OR department == "Engineering" -| KEEP first_name, last_name, age, department +| WHERE salary > 50000 OR bonus > 10000 ``` ### NOT + Logical NOT operator. -#### Example: +Example: + ```esql FROM employees -| WHERE NOT (age > 30) -| KEEP first_name, last_name, age +| WHERE NOT (salary > 50000) ``` ## Other Operators ### IS NULL and IS NOT NULL -For NULL comparison, use the `IS NULL` and `IS NOT NULL` predicates. -#### Example: +The `IS NULL` operator returns true if the value is null. + +Example: + ```esql FROM employees -| WHERE birth_date IS NULL -| KEEP first_name, last_name -| SORT first_name -| LIMIT 3 +| WHERE manager IS NULL ``` +The `IS NOT NULL` operator returns true if the value is not null. + +Example: + ```esql FROM employees -| WHERE is_rehired IS NOT NULL -| STATS COUNT(emp_no) +| WHERE manager IS NOT NULL ``` -### Cast (`::`) -The `::` operator provides a convenient alternative syntax to the `TO_` conversion functions. +### IN + +The `IN` operator checks if a value is within a set of values (literals, fields or expressions). + +Example: -#### Example: ```esql -ROW ver = CONCAT(("0"::INT + 1)::STRING, ".2.3")::VERSION +FROM employees +| WHERE department IN ("Sales", "Marketing", "HR") ``` -### IN -The `IN` operator allows testing whether a field or expression equals an element in a list of literals, fields, or expressions. - -#### Example: ```esql ROW a = 1, b = 4, c = 3 | WHERE c-a IN (3, b / 2, a) ``` ### LIKE -Use `LIKE` to filter data based on string patterns using wildcards. The following wildcard characters are supported: + +Use `LIKE` to filter data based on string patterns using wildcards. + +The following wildcard characters are supported: - `*` matches zero or more characters. - `?` matches one character. -#### Example: +Example: + ```esql FROM employees | WHERE first_name LIKE "?b*" @@ -206,11 +226,24 @@ FROM employees ``` ### RLIKE + Use `RLIKE` to filter data based on string patterns using regular expressions. -#### Example: +Example: + ```esql FROM employees | WHERE first_name RLIKE ".leja.*" | KEEP first_name, last_name -``` \ No newline at end of file +``` + +### Cast `::` + +The `::` operator provides a convenient alternative syntax to the `TO_` conversion functions. + +Example: + +```esql +FROM employees +| EVAL salary = salary::double +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-overview.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-overview.txt index 32e82c7986480..952ba28dd0b8e 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-overview.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-overview.txt @@ -1,19 +1,13 @@ -## Overview +## ES|QL Overview ### ES|QL -The Elasticsearch Query Language (ES|QL) provides a powerful way to filter, transform, and analyze data stored in Elasticsearch, and in the future in other runtimes. It is designed to be easy to learn and use by end users, SRE teams, application developers, and administrators. +The Elasticsearch Query Language (ES|QL) provides a powerful way to filter, transform, and analyze data stored in Elasticsearch. It is designed to be easy to learn and use by all types of end users. Users can author ES|QL queries to find specific events, perform statistical analysis, and generate visualizations. It supports a wide range of commands and functions that enable users to perform various data operations, such as filtering, aggregation, time-series analysis, and more. ES|QL makes use of "pipes" (`|`) to manipulate and transform data in a step-by-step fashion. This approach allows users to compose a series of operations, where the output of one operation becomes the input for the next, enabling complex data transformations and analysis. -### The ES|QL Compute Engine - -ES|QL is more than a language: it represents a significant investment in new compute capabilities within Elasticsearch. To achieve both the functional and performance requirements for ES|QL, it was necessary to build an entirely new compute architecture. ES|QL search, aggregation, and transformation functions are directly executed within Elasticsearch itself. Query expressions are not transpiled to Query DSL for execution. This approach allows ES|QL to be extremely performant and versatile. - -The new ES|QL execution engine was designed with performance in mind — it operates on blocks at a time instead of per row, targets vectorization and cache locality, and embraces specialization and multi-threading. It is a separate component from the existing Elasticsearch aggregation framework with different performance characteristics. - ### Known Limitations #### Result Set Size Limit @@ -79,7 +73,7 @@ ES|QL only supports the UTC timezone. ### Cross-Cluster Querying -Using ES|QL across clusters allows you to execute a single query across multiple clusters. This feature is in technical preview and may be changed or removed in a future release. +Using ES|QL across clusters allows you to execute a single query across multiple clusters. This feature is in technical preview and may be changed or removed in a future release. #### Prerequisites @@ -98,7 +92,7 @@ FROM cluster_one:my-index-000001 ### Using ES|QL in Kibana -ES|QL can be used in Kibana to query and aggregate data, create visualizations, and set up alerts. +ES|QL can be used in Kibana to query and aggregate data, create visualizations, and set up alerts. #### Important Information @@ -106,39 +100,3 @@ ES|QL can be used in Kibana to query and aggregate data, create visualizations, - The query bar in Discover allows you to write and execute ES|QL queries. - The results table shows up to 10,000 rows, and Discover shows no more than 50 columns. - You can create visualizations and alerts based on ES|QL queries. - -### Using the REST API - -The ES|QL query API allows you to execute ES|QL queries via REST API. - -#### Example - -```javascript -const response = await client.esql.query({ - query: ` - FROM library - | EVAL year = DATE_TRUNC(1 YEARS, release_date) - | STATS MAX(page_count) BY year - | SORT year - | LIMIT 5 - `, -}); -console.log(response); -``` - -#### Request - -`POST /_query` - -#### Request Body - -- `query` (Required): The ES|QL query to run. -- `format` (Optional): Format for the response. -- `params` (Optional): Values for parameters in the query. -- `profile` (Optional): If `true`, includes a `profile` object with information about query execution. - -#### Response - -- `columns`: Column `name` and `type` for each column returned in `values`. -- `rows`: Values for the search results. -- `profile`: Profile describing the execution of the query (if `profile` was sent in the request). \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-percentile.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-percentile.txt index 4873cace16392..499e666de2f03 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-percentile.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-percentile.txt @@ -1,8 +1,22 @@ -## PERCENTILE +# PERCENTILE -The `PERCENTILE` function returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the MEDIAN. +The PERCENTILE function calculates the value at a specified percentile of observed values. -### Examples +## Syntax + +`PERCENTILE(number, percentile)` + +### Parameters + +#### number + +The numeric expression that represents the set of values to be analyzed. + +#### percentile + +The percentile to compute. The value should be between 0 and 100. + +## Examples ```esql FROM employees @@ -14,13 +28,8 @@ FROM employees | STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80) ``` -PERCENTILE is usually approximate. There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, approximate percentiles are calculated. The algorithm used by the percentile metric is called TDigest (introduced by Ted Dunning in Computing Accurate Quantiles using T-Digests). - -When using this metric, there are a few guidelines to keep in mind: -- Accuracy is proportional to q(1-q). This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median. -- For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough). -- As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated. +## Notes -The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile. It shows how precision is better for extreme percentiles. The reason why error diminishes for a large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions. +- PERCENTILE is usually approximate. -PERCENTILE is also non-deterministic. This means you can get slightly different results using the same data. \ No newline at end of file +- PERCENTILE is also non-deterministic. This means you can get slightly different results using the same data. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pi.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pi.txt index 2afabb44200ea..e97942a90a15f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pi.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pi.txt @@ -1,8 +1,16 @@ -## PI +# PI -The `PI` function returns Pi, the ratio of a circle’s circumference to its diameter. +The PI function returns the mathematical constant Pi, which is the ratio of a circle's circumference to its diameter. -### Examples +## Syntax + +`PI()` + +### Parameters + +This function does not require any parameters. + +## Examples ```esql ROW PI() @@ -12,4 +20,4 @@ ROW PI() FROM employees | EVAL pi_value = PI() | KEEP pi_value -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pow.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pow.txt index 43e021e4883e4..22a0ca966e0d0 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pow.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-pow.txt @@ -1,8 +1,22 @@ -## POW +# POW -The `POW` function returns the value of a base raised to the power of an exponent. It is still possible to overflow a double result here; in that case, null will be returned. +The POW function calculates the value of a base number raised to the power of an exponent number. -### Examples +## Syntax + +`POW(base, exponent)` + +### Parameters + +#### base + +This is a numeric expression for the base. + +#### exponent + +This is a numeric expression for the exponent. + +## Examples ```esql ROW base = 2.0, exponent = 2 @@ -12,4 +26,4 @@ ROW base = 2.0, exponent = 2 ```esql ROW base = 4, exponent = 0.5 | EVAL s = POW(base, exponent) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rename.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rename.txt index 0e9dd3258b3c8..bd482258c21f6 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rename.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rename.txt @@ -1,37 +1,35 @@ -## RENAME +# RENAME -The `RENAME` processing command in ES|QL is used to rename one or more columns in a dataset. This command is particularly useful when you need to standardize column names, make them more readable, or avoid conflicts with existing column names. If a column with the new name already exists, it will be replaced by the new column. If multiple columns are renamed to the same name, all but the rightmost column with the same new name are dropped. +The RENAME command is used to change the names of one or more columns in a table. -### Examples +## Syntax -Here are some example ES|QL queries using the `RENAME` command: +`RENAME old_name1 AS new_name1[, ..., old_nameN AS new_nameN]` + +### Parameters + +#### old_nameX + +This is the current name of the column that you want to rename. + +#### new_nameX + +This is the new name that you want to assign to the column. If a column with the new name already exists, the existing column will be replaced. If multiple columns are renamed to the same name, all but the rightmost column with the same new name will be dropped. + +## Examples -1. **Renaming a single column:** +The following example renames the column "still_hired" to "employed": - ```esql +```esql FROM employees | KEEP first_name, last_name, still_hired -| RENAME still_hired AS employed +| RENAME still_hired AS employed ``` -2. **Renaming multiple columns in a single command:** +You can rename multiple columns with a single RENAME command: - ```esql +```esql FROM employees | KEEP first_name, last_name | RENAME first_name AS fn, last_name AS ln ``` - -### Syntax - -`RENAME old_name1 AS new_name1[, ..., old_nameN AS new_nameN]` - -### Parameters - -- **old_nameX**: The name of a column you want to rename. -- **new_nameX**: The new name of the column. If it conflicts with an existing column name, the existing column is dropped. If multiple columns are renamed to the same name, all but the rightmost column with the same new name are dropped. - -### Limitations - -- If a column with the new name already exists, it will be replaced by the new column. -- If multiple columns are renamed to the same name, all but the rightmost column with the same new name are dropped. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-repeat.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-repeat.txt index aa702c052e1b7..f6a830f6ee3f5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-repeat.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-repeat.txt @@ -1,8 +1,22 @@ -## REPEAT +# REPEAT -The `REPEAT` function returns a string constructed by concatenating the input string with itself the specified number of times. +The REPEAT function generates a string by repeating a specified string a certain number of times. -### Examples +## Syntax + +`REPEAT(string, number)` + +### Parameters + +#### string + +The string that you want to repeat. + +#### number + +The number of times you want to repeat the string. + +## Examples ```esql ROW a = "Hello!" @@ -12,4 +26,4 @@ ROW a = "Hello!" ```esql ROW greeting = "Hi" | EVAL repeated_greeting = REPEAT(greeting, 5) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-replace.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-replace.txt index 931fcab1d25b9..930efd579f610 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-replace.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-replace.txt @@ -1,8 +1,28 @@ -## REPLACE +# REPLACE -The `REPLACE` function substitutes in the string `str` any match of the regular expression `regex` with the replacement string `newStr`. +The REPLACE function substitutes any match of a regular expression within a string with a replacement string. -### Examples +## Syntax + +`REPLACE(string, regex, newString)` + +### Parameters + +#### string + +The string expression where the replacement will occur. + +#### regex + +The regular expression that will be matched in the string. + +#### newString + +The string that will replace the matched regular expression in the string. + +## Examples + +The following example replaces any occurrence of the word "World" with the word "Universe": ```esql ROW str = "Hello World" @@ -16,4 +36,4 @@ Another example could be replacing digits in a string with a specific character: ROW str = "User123" | EVAL str = REPLACE(str, "\\d", "*") | KEEP str -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-right.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-right.txt index 99e1fbf2d3c1b..081fe025522c5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-right.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-right.txt @@ -1,8 +1,24 @@ -## RIGHT +# RIGHT -The `RIGHT` function returns a substring that extracts a specified number of characters from a string, starting from the right. +The RIGHT function extracts a specified number of characters from the end of a string. -### Examples +## Syntax + +`RIGHT(string, length)` + +### Parameters + +#### string + +The string from which a substring is to be returned. + +#### length + +The number of characters to return from the end of the string. + +## Examples + +The following example extracts the last three characters from the `last_name` field: ```esql FROM employees @@ -16,4 +32,4 @@ FROM employees ROW full_name = "John Doe" | EVAL last_part = RIGHT(full_name, 4) | KEEP last_part -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-round.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-round.txt index a3efefb84d2d0..1e62d7a4c4915 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-round.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-round.txt @@ -1,8 +1,24 @@ -## ROUND +# ROUND -The `ROUND` function rounds a number to the specified number of decimal places. By default, it rounds to 0 decimal places, which returns the nearest integer. If the precision is a negative number, it rounds to the number of digits left of the decimal point. If the input value is null, the function returns null. +The ROUND function rounds a numeric value to a specified number of decimal places. -### Examples +## Syntax + +`ROUND(number, decimals)` + +### Parameters + +#### number + +The numeric value to be rounded. + +#### decimals + +The number of decimal places to which the number should be rounded. The default value is 0. + +## Examples + +The following example rounds the height of employees to one decimal place after converting it from meters to feet: ```esql FROM employees @@ -14,4 +30,8 @@ FROM employees FROM sales | KEEP product_name, revenue | EVAL rounded_revenue = ROUND(revenue, -2) -``` \ No newline at end of file +``` + +## Notes + +If "decimals" is a negative number, the ROUND function rounds to the number of digits left of the decimal point. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-row.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-row.txt index 079668328f76d..26b994ecbee6c 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-row.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-row.txt @@ -1,10 +1,22 @@ -## ROW +# ROW -The `ROW` source command produces a row with one or more columns with values that you specify. This can be useful for testing. The command allows you to create a row with specified column names and values, which can be literals, expressions, or functions. In case of duplicate column names, only the rightmost duplicate creates a column. +The ROW command is used to generate a row with one or more columns with specified values. This can be particularly useful for testing purposes. -### Examples +## Syntax -Here are some example ES|QL queries using the `ROW` command: +`ROW column1 = value1[, ..., columnN = valueN]` + +### Parameters + +#### {column name} + +This is the name of the column. If there are duplicate column names, only the rightmost duplicate will create a column. + +#### {value} + +This is the value for the column. It can be a literal, an expression, or a function. + +## Examples 1. Creating a row with simple literal values: ```esql @@ -29,4 +41,4 @@ ROW x = 5, y = [3, 4], z = TO_STRING(123) 5. Using nested functions within a row: ```esql ROW a = ABS(-10), b = CONCAT("Hello", " ", "World"), c = TO_BOOLEAN("true") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rtrim.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rtrim.txt index 8060580a76ae0..1a57382fe8c3e 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rtrim.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-rtrim.txt @@ -1,8 +1,20 @@ -## RTRIM +# RTRIM -Removes trailing whitespaces from a string. +The RTRIM function is used to remove trailing whitespaces from a string. -### Examples +## Syntax + +`RTRIM(string)` + +### Parameters + +#### string + +This is the string expression from which trailing whitespaces will be removed. + +## Examples + +The following example demonstrates how to use the RTRIM function: ```esql ROW message = " some text ", color = " red " @@ -10,4 +22,4 @@ ROW message = " some text ", color = " red " | EVAL color = RTRIM(color) | EVAL message = CONCAT("'", message, "'") | EVAL color = CONCAT("'", color, "'") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-show.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-show.txt index ed27f65613931..13e046076e30b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-show.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-show.txt @@ -1,24 +1,21 @@ -## SHOW +# SHOW -The `SHOW` source command returns information about the deployment and its capabilities. This command is useful for retrieving metadata about the Elasticsearch deployment, such as the version, build date, and hash. It is particularly helpful for administrators and developers who need to verify the deployment details or troubleshoot issues. The `SHOW` command has a limitation in that it can only be used with the `INFO` item. +The SHOW command retrieves details about the deployment and its capabilities. -### Examples +## Syntax -Here are some example ES|QL queries using the `SHOW` command: +`SHOW item` -1. Retrieve the deployment’s version, build date, and hash: - ```esql -SHOW INFO -``` +### Parameters -2. Use the `SHOW` command in a multi-line query for better readability: - ```esql -SHOW INFO -``` +#### item -3. Another example of using the `SHOW` command to get deployment information: - ```esql +The only acceptable value is `INFO`. + +## Examples + +Retrieve the deployment’s version, build date, and hash: + +```esql SHOW INFO ``` - -These examples demonstrate the primary usage of the `SHOW` command to retrieve deployment information. \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-signum.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-signum.txt index 4a1bab62699af..083d913dd99b4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-signum.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-signum.txt @@ -1,8 +1,18 @@ -## SIGNUM +# SIGNUM -The `SIGNUM` function returns the sign of the given number. It returns -1 for negative numbers, 0 for 0, and 1 for positive numbers. +The SIGNUM function returns the sign of a given number. It outputs `-1` for negative numbers, `0` for `0`, and `1` for positive numbers. -### Examples +## Syntax + +`SIGNUM(number)` + +### Parameters + +#### number + +A numeric expression. + +## Examples ```esql ROW d = 100.0 @@ -12,4 +22,4 @@ ROW d = 100.0 ```esql ROW d = -50.0 | EVAL s = SIGNUM(d) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sin.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sin.txt index 2083ea8f29dad..3d28baafd53d5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sin.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sin.txt @@ -1,8 +1,18 @@ -## SIN +# SIN -The `SIN` function returns the sine trigonometric function of an angle, expressed in radians. If the input angle is null, the function returns null. +The SIN function calculates the sine of a given angle. -### Examples +## Syntax + +`SIN(angle)` + +### Parameters + +#### angle + +The angle for which the sine value is to be calculated. The angle should be in radians. + +## Examples ```esql ROW a=1.8 @@ -12,4 +22,4 @@ ROW a=1.8 ```esql ROW angle=0.5 | EVAL sine_value = SIN(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sinh.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sinh.txt index 189fdb8a8b82f..eaec5ceb54862 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sinh.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sinh.txt @@ -1,15 +1,25 @@ -## SINH +# SINH -The `SINH` function returns the hyperbolic sine of an angle. +The SINH function calculates the hyperbolic sine of a given angle. -### Examples +## Syntax + +`SINH(angle)` + +### Parameters + +#### angle + +The angle in radians for which the hyperbolic sine is to be calculated. If the parameter is null, the function will return null. + +## Examples ```esql ROW a=1.8 -| EVAL sinh = SINH(a) +| EVAL sinh=SINH(a) ``` ```esql ROW angle=0.5 | EVAL hyperbolic_sine = SINH(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sort.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sort.txt index 0d68d505a5d78..593d94021b71b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sort.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sort.txt @@ -1,21 +1,20 @@ -## SORT +# SORT -The `SORT` processing command in ES|QL is used to sort a table based on one or more columns. This command is essential for organizing data in a specific order, which can be particularly useful for reporting, data analysis, and visualization. The default sort order is ascending, but you can specify descending order using `DESC`. Additionally, you can handle null values explicitly by using `NULLS FIRST` or `NULLS LAST`. +The SORT command is used to arrange a table based on one or more columns. -### Use Cases -- **Organizing Data**: Sort data to make it easier to read and analyze. -- **Reporting**: Generate reports where data needs to be presented in a specific order. -- **Data Analysis**: Facilitate data analysis by sorting data based on key metrics. -- **Visualization**: Prepare data for visualizations that require sorted input. +## Syntax -### Limitations -- **Multivalued Columns**: When sorting on multivalued columns, the lowest value is used for ascending order and the highest value for descending order. -- **Null Values**: By default, null values are treated as larger than any other value. This can be changed using `NULLS FIRST` or `NULLS LAST`. +`SORT column1 [ASC/DESC][NULLS FIRST/NULLS LAST][, ..., columnN [ASC/DESC][NULLS FIRST/NULLS LAST]]` + +### Parameters + +#### columnX -### Examples +The column on which the sorting is to be performed. -#### Basic Sorting -Sort the `employees` table by the `height` column in ascending order: +## Examples + +Sort a table based on the 'height' column: ```esql FROM employees @@ -23,8 +22,7 @@ FROM employees | SORT height ``` -#### Explicit Ascending Order -Sort the `employees` table by the `height` column in descending order: +Explicitly sort in ascending order with `ASC`: ```esql FROM employees @@ -32,8 +30,7 @@ FROM employees | SORT height DESC ``` -#### Multiple Sort Expressions -Sort the `employees` table by the `height` column in descending order and use `first_name` as a tie breaker in ascending order: +Provide additional sort expressions to act as tie breakers: ```esql FROM employees @@ -41,8 +38,7 @@ FROM employees | SORT height DESC, first_name ASC ``` -#### Sorting Null Values First -Sort the `employees` table by the `first_name` column in ascending order, placing null values first: +Sort `null` values first using `NULLS FIRST`: ```esql FROM employees @@ -50,4 +46,21 @@ FROM employees | SORT first_name ASC NULLS FIRST ``` -These examples demonstrate the versatility of the `SORT` command in organizing data for various analytical and reporting needs. \ No newline at end of file +## Notes + +If SORT is used right after a KEEP command, make sure it only uses column names in KEEP, +or move the SORT before the KEEP, e.g. +- not correct: KEEP date | SORT @timestamp, +- correct: SORT @timestamp | KEEP date) + +By default, the sorting order is ascending. You can specify an explicit sort order by using `ASC` for ascending or `DESC` for descending. + +If two rows have the same sort key, they are considered equal. You can provide additional sort expressions to act as tie breakers. + +When sorting on multivalued columns, the lowest value is used when sorting in ascending order and the highest value is used when sorting in descending order. + +By default, `null` values are treated as being larger than any other value. This means that with an ascending sort order, `null` values are sorted last, and with a descending sort order, `null` values are sorted first. You can change this by providing `NULLS FIRST` or `NULLS LAST`. + +## Limitations +- **Multivalued Columns**: When sorting on multivalued columns, the lowest value is used for ascending order and the highest value for descending order. +- **Null Values**: By default, null values are treated as larger than any other value. This can be changed using `NULLS FIRST` or `NULLS LAST`. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-split.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-split.txt index d18750ac146f6..14ff9ecb94bcd 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-split.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-split.txt @@ -1,8 +1,22 @@ -## SPLIT +# SPLIT -The `SPLIT` function splits a single-valued string into multiple strings based on a specified delimiter. +The SPLIT function is used to divide a single string into multiple strings. -### Examples +## Syntax + +`SPLIT(string, delim)` + +### Parameters + +#### string + +This is the string expression that you want to split. + +#### delim + +This is the delimiter used to split the string. Currently, only single byte delimiters are supported. + +## Examples ```esql ROW words="foo;bar;baz;qux;quux;corge" @@ -12,4 +26,4 @@ ROW words="foo;bar;baz;qux;quux;corge" ```esql ROW sentence="hello world;this is ES|QL" | EVAL words = SPLIT(sentence, " ") -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sqrt.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sqrt.txt index 4988c31564633..d1839b7d6d06a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sqrt.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sqrt.txt @@ -1,8 +1,19 @@ -## SQRT +# SQRT -The `SQRT` function returns the square root of a number. The input can be any numeric value, and the return value is always a double. Square roots of negative numbers and infinities are null. +The SQRT function calculates the square root of a given number. + +## Syntax + +`SQRT(number)` + +### Parameters + +#### number + +This is a numeric expression. + +## Examples -### Examples ```esql ROW d = 100.0 @@ -13,4 +24,4 @@ ROW d = 100.0 FROM employees | KEEP first_name, last_name, height | EVAL sqrt_height = SQRT(height) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_centroid_agg.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_centroid_agg.txt index a58a2d6550e8a..b9baa82bfe5ae 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_centroid_agg.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_centroid_agg.txt @@ -1,8 +1,20 @@ -## ST_CENTROID_AGG +# ST_CENTROID_AGG -The `ST_CENTROID_AGG` function calculates the spatial centroid over a field with spatial point geometry type. +The ST_CENTROID_AGG function calculates the spatial centroid over a field with spatial point geometry type. -### Examples +## Syntax + +`ST_CENTROID_AGG(field)` + +### Parameters + +#### field + +The field parameter represents the column that contains the spatial point geometry data. + +## Examples + +Here is an example of how to use the ST_CENTROID_AGG function: ```esql FROM airports @@ -12,4 +24,4 @@ FROM airports ```esql FROM city_boundaries | STATS city_centroid = ST_CENTROID_AGG(boundary) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_contains.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_contains.txt index 8d1dc8da115fc..50f5608b22046 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_contains.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_contains.txt @@ -1,8 +1,22 @@ -## ST_CONTAINS +# ST_CONTAINS -Returns whether the first geometry contains the second geometry. This is the inverse of the `ST_WITHIN` function. +The ST_CONTAINS function determines if the first specified geometry encompasses the second one. This function is the inverse of the ST_WITHIN function. -### Examples +## Syntax + +`ST_CONTAINS(geomA, geomB)` + +### Parameters + +#### geomA + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. + +#### geomB + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. + +## Examples ```esql FROM airport_city_boundaries @@ -14,4 +28,8 @@ FROM airport_city_boundaries FROM regions | WHERE ST_CONTAINS(region_boundary, TO_GEOSHAPE("POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))")) | KEEP region_name, region_code, region_boundary -``` \ No newline at end of file +``` + +## Limitations + +It's important to note that the second parameter must have the same coordinate system as the first. Therefore, it's not possible to combine `geo_*` and `cartesian_*` parameters. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_disjoint.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_disjoint.txt index 7f22061024330..41433a4069f0c 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_disjoint.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_disjoint.txt @@ -1,8 +1,22 @@ -## ST_DISJOINT +# ST_DISJOINT -The `ST_DISJOINT` function returns whether two geometries or geometry columns are disjoint. This is the inverse of the `ST_INTERSECTS` function. In mathematical terms: `ST_Disjoint(A, B) ⇔ A ⋂ B = ∅`. +The ST_DISJOINT function checks if two geometries or geometry columns are disjoint, meaning they do not intersect. This function is the inverse of the ST_INTERSECTS function. In mathematical terms, if A and B are two geometries, ST_Disjoint(A, B) is true if and only if the intersection of A and B is empty. -### Examples +## Syntax + +`ST_DISJOINT(geomA, geomB)` + +### Parameters + +#### geomA + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. + +#### geomB + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. + +## Examples ```esql FROM airport_city_boundaries @@ -14,4 +28,8 @@ FROM airport_city_boundaries FROM airport_city_boundaries | WHERE ST_DISJOINT(city_boundary, TO_GEOSHAPE("POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))")) | KEEP abbrev, airport, region, city, city_location -``` \ No newline at end of file +``` + +## Limitations + +It's important to note that the second parameter must have the same coordinate system as the first. This means you cannot combine `geo_*` and `cartesian_*` parameters. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_distance.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_distance.txt index a1d3e05842c4a..5a007367dc0b8 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_distance.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_distance.txt @@ -1,8 +1,22 @@ -## ST_DISTANCE +# ST_DISTANCE -The `ST_DISTANCE` function computes the distance between two points. For cartesian geometries, this is the pythagorean distance in the same units as the original coordinates. For geographic geometries, this is the circular distance along the great circle in meters. +The ST_DISTANCE function calculates the distance between two points. -### Examples +## Syntax + +`ST_DISTANCE(geomA, geomB)` + +### Parameters + +#### geomA + +This is an expression of type `geo_point` or `cartesian_point`. + +#### geomB + +This is an expression of type `geo_point` or `cartesian_point`. + +## Examples ```esql FROM airports @@ -16,4 +30,8 @@ FROM airports | WHERE abbrev == "JFK" | EVAL distance = ST_DISTANCE(location, city_location) | KEEP abbrev, name, location, city_location, distance -``` \ No newline at end of file +``` + +## Limitations + +- It's important to note that the second parameter must have the same coordinate system as the first. Therefore, it's not possible to combine `geo_point` and `cartesian_point` parameters. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_intersects.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_intersects.txt index 46df07d8d9f67..63e2ff127d1a9 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_intersects.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_intersects.txt @@ -1,8 +1,22 @@ -## ST_INTERSECTS +# ST_INTERSECTS -The `ST_INTERSECTS` function returns true if two geometries intersect. They intersect if they have any point in common, including their interior points (points along lines or within polygons). This is the inverse of the `ST_DISJOINT` function. In mathematical terms: `ST_Intersects(A, B) ⇔ A ⋂ B ≠ ∅`. +The ST_INTERSECTS function checks if two geometries intersect. They intersect if they share any point, including points within their interiors (points along lines or within polygons). This function is the inverse of the ST_DISJOINT function. In mathematical terms, ST_Intersects(A, B) is true if the intersection of A and B is not empty. -### Examples +## Syntax + +`ST_INTERSECTS(geomA, geomB)` + +### Parameters + +#### geomA + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. If `null`, the function returns `null`. + +#### geomB + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. If `null`, the function returns `null`. The second parameter must also have the same coordinate system as the first. This means it is not possible to combine `geo_*` and `cartesian_*` parameters. + +## Examples ```esql FROM airports @@ -13,4 +27,4 @@ FROM airports FROM city_boundaries | WHERE ST_INTERSECTS(boundary, TO_GEOSHAPE("POLYGON((10 10, 20 10, 20 20, 10 20, 10 10))")) | KEEP city_name, boundary -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_within.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_within.txt index 24883a731e24b..20e61d6a234df 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_within.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_within.txt @@ -1,8 +1,22 @@ -## ST_WITHIN +# ST_WITHIN -The `ST_WITHIN` function returns whether the first geometry is within the second geometry. This is the inverse of the `ST_CONTAINS` function. +The ST_WITHIN function checks if the first geometry is located within the second geometry. -### Examples +## Syntax + +`ST_WITHIN(geomA, geomB)` + +### Parameters + +#### geomA + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. If the value is `null`, the function will return `null`. + +#### geomB + +This is an expression of type `geo_point`, `cartesian_point`, `geo_shape`, or `cartesian_shape`. If the value is `null`, the function will return `null`. It's important to note that the second parameter must have the same coordinate system as the first. This means you cannot combine `geo_*` and `cartesian_*` parameters. + +## Examples ```esql FROM airport_city_boundaries @@ -14,4 +28,4 @@ FROM airport_city_boundaries FROM parks | WHERE ST_WITHIN(park_boundary, TO_GEOSHAPE("POLYGON((40.7128 -74.0060, 40.7128 -73.9352, 40.7306 -73.9352, 40.7306 -74.0060, 40.7128 -74.0060))")) | KEEP park_name, park_boundary -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_x.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_x.txt index 11b569d7db065..18e35874333fb 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_x.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_x.txt @@ -1,8 +1,20 @@ -## ST_X +# ST_X -The `ST_X` function extracts the x coordinate from the supplied point. If the point is of type `geo_point`, this is equivalent to extracting the longitude value. +The ST_X function extracts the `x` coordinate from a given point. -### Examples +## Syntax + +`ST_X(point)` + +### Parameters + +#### point + +This is an expression of type `geo_point` or `cartesian_point`. + +## Examples + +Here is an example of how to use the ST_X function: ```esql ROW point = TO_GEOPOINT("POINT(42.97109629958868 14.7552534006536)") @@ -12,4 +24,4 @@ ROW point = TO_GEOPOINT("POINT(42.97109629958868 14.7552534006536)") ```esql ROW point = TO_CARTESIANPOINT("POINT(100.0 200.0)") | EVAL x = ST_X(point), y = ST_Y(point) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_y.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_y.txt index be0b96539bf15..1e918c4a1913a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_y.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-st_y.txt @@ -1,8 +1,19 @@ -## ST_Y +# ST_Y -The `ST_Y` function extracts the y coordinate from the supplied point. If the point is of type `geo_point`, this is equivalent to extracting the latitude value. +The ST_Y function extracts the `y` coordinate from a given point. + +## Syntax + +`ST_Y(point)` + +### Parameters + +#### point + +This is an expression of type `geo_point` or `cartesian_point`. + +## Examples -### Examples ```esql ROW point = TO_GEOPOINT("POINT(42.97109629958868 14.7552534006536)") @@ -12,4 +23,4 @@ ROW point = TO_GEOPOINT("POINT(42.97109629958868 14.7552534006536)") ```esql ROW point = TO_GEOPOINT("POINT(34.052235 -118.243683)") | EVAL latitude = ST_Y(point) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-starts_with.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-starts_with.txt index a293c13d1d706..31578d3786ee1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-starts_with.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-starts_with.txt @@ -1,8 +1,24 @@ -## STARTS_WITH +# STARTS_WITH -The `STARTS_WITH` function returns a boolean that indicates whether a keyword string starts with another string. +The STARTS_WITH function returns a boolean value indicating whether a keyword string begins with a specified string. -### Examples +## Syntax + +`STARTS_WITH(str, prefix)` + +### Parameters + +#### str + +This is a string expression. + +#### prefix + +This is a string expression that will be checked if it is the starting sequence of the `str` parameter. + +## Examples + +The following example checks if the `last_name` of employees starts with the letter "B": ```esql FROM employees @@ -15,4 +31,4 @@ FROM employees | KEEP first_name, last_name | EVAL fn_S = STARTS_WITH(first_name, "A") | WHERE fn_S -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-stats.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-stats.txt index a85669b4b3fa1..795213778c87b 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-stats.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-stats.txt @@ -1,21 +1,55 @@ -## STATS +# STATS ... BY -The `STATS ... BY` processing command in ES|QL groups rows according to a common value and calculates one or more aggregated values over the grouped rows. This command is highly useful for performing statistical analysis and aggregations on datasets. It supports a variety of aggregation functions such as `AVG`, `COUNT`, `COUNT_DISTINCT`, `MAX`, `MEDIAN`, `MIN`, `SUM`, and more. +The `STATS ... BY` command groups rows based on a common value and calculates one or more aggregated values over these grouped rows. -### Use Cases -- **Statistical Analysis**: Calculate average, sum, count, and other statistical measures over grouped data. -- **Data Aggregation**: Aggregate data based on specific fields to derive meaningful insights. -- **Time-Series Analysis**: Group data by time intervals to analyze trends over time. +## Syntax -### Limitations -- **Performance**: `STATS` without any groups is much faster than adding a group. Grouping on a single expression is more optimized than grouping on multiple expressions. -- **Multivalue Fields**: If the grouping key is multivalued, the input row is included in all groups. -- **Technical Preview**: Some functions like `PERCENTILE`, `ST_CENTROID_AGG`, `VALUES`, and `WEIGHTED_AVG` are in technical preview and may change in future releases. +```esql +STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_expression1[, ..., grouping_expressionN]] +``` + +### Parameters + +#### columnX + +The name for the aggregated value in the output. If not provided, the name defaults to the corresponding expression (`expressionX`). + +#### expressionX + +An expression that computes an aggregated value. + +#### grouping_expressionX -### Examples +An expression that outputs the values to group by. If its name coincides with one of the computed columns, that column will be ignored. -#### Example 1: Grouping by a Single Column -Calculate the count of employees grouped by languages: +## Description + +The `STATS ... BY` command groups rows based on a common value and calculates one or more aggregated values over these grouped rows. +If `BY` is omitted, the output table contains exactly one row with the aggregations applied over the entire dataset. + +The following aggregation functions are supported: + +- `AVG` +- `COUNT` +- `COUNT_DISTINCT` +- `MAX` +- `MEDIAN` +- `MEDIAN_ABSOLUTE_DEVIATION` +- `MIN` +- `PERCENTILE` +- `ST_CENTROID_AGG` +- `SUM` +- `TOP` +- `VALUES` +- `WEIGHTED_AVG` + +> Note: `STATS` without any groups is significantly faster than adding a group. + +> Note: Grouping on a single expression is currently much more optimized than grouping on many expressions. In some tests, grouping on a single `keyword` column was found to be five times faster than grouping on two `keyword` columns. Do not attempt to work around this by combining the two columns together with a function like `CONCAT` and then grouping - this will not be faster. + +## Examples + +Calculate a statistic and group by the values of another column: ```esql FROM employees @@ -23,24 +57,27 @@ FROM employees | SORT languages ``` -#### Example 2: Aggregation Without Grouping -Calculate the average number of languages spoken by employees: +Omitting `BY` returns one row with the aggregations applied over the entire dataset: ```esql FROM employees | STATS avg_lang = AVG(languages) ``` -#### Example 3: Multiple Aggregations -Calculate both the average and maximum number of languages spoken by employees: +It’s possible to calculate multiple values: ```esql FROM employees | STATS avg_lang = AVG(languages), max_lang = MAX(languages) ``` -#### Example 4: Grouping by Multiple Values -Calculate the average salary grouped by the year of hire and language: +If the grouping key is multivalued then the input row is in all groups: + +```esql +ROW i=1, a=["a", "b"] | STATS MIN(i) BY a | SORT a ASC +``` + +It’s also possible to group by multiple values: ```esql FROM employees @@ -50,8 +87,20 @@ FROM employees | SORT hired, languages.long ``` -#### Example 5: Grouping by an Expression -Group employees by the first letter of their last name and count them: +If all grouping keys are multivalued then the input row is in all groups: + +```esql +ROW i=1, a=["a", "b"], b=[2, 3] | STATS MIN(i) BY a, b | SORT a ASC, b ASC +``` + +Both the aggregating functions and the grouping expressions accept other functions. This is useful for using `STATS...BY` on multivalue columns. + +```esql +FROM employees +| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10) +``` + +An example of grouping by an expression is grouping employees on the first letter of their last name: ```esql FROM employees @@ -59,13 +108,26 @@ FROM employees | SORT `LEFT(last_name, 1)` ``` -#### Example 6: Using Multivalue Columns -Calculate the minimum value of a multivalue column: +Specifying the output column name is optional. If not specified, the new column name is equal to the expression. The following query returns a column named `AVG(salary)`: ```esql -ROW i=1, a=["a", "b"], b=[2, 3] -| STATS MIN(i) BY a, b -| SORT a ASC, b ASC +FROM employees +| STATS AVG(salary) ``` -These examples showcase the versatility and power of the `STATS ... BY` command in performing various types of data aggregations and statistical analyses. \ No newline at end of file +Because this name contains special characters, it needs to be quoted with backticks (`) when using it in subsequent commands: + +```esql +FROM employees +| STATS AVG(salary) +| EVAL avg_salary_rounded = ROUND(`AVG(salary)`) +``` + +## Notes + +- If multiple columns share the same name, all but the rightmost column with this name are ignored. + +### Limitations + +- **Performance**: `STATS` without any groups is much faster than adding a group. Grouping on a single expression is more optimized than grouping on multiple expressions. +- **Multivalue Fields**: If the grouping key is multivalued, the input row is included in all groups. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-substring.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-substring.txt index a03574e7d2cef..a47ec9546047a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-substring.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-substring.txt @@ -1,20 +1,29 @@ -## SUBSTRING +# SUBSTRING -The `SUBSTRING` function returns a substring of a string, specified by a start position and an optional length. +The SUBSTRING function extracts a portion of a string, as specified by a starting position and an optional length. -### Syntax +## Syntax `SUBSTRING(string, start, [length])` ### Parameters -- `string`: String expression. If null, the function returns null. -- `start`: Start position. -- `length`: Length of the substring from the start position. Optional; if omitted, all positions after start are returned. +#### string -### Examples +The string expression from which to extract the substring. -This example returns the first three characters of every last name: +#### start + +The starting position for the substring extraction. + +#### length + +The length of the substring to be extracted from the starting position. +This parameter is optional. If it's omitted, the function will return all positions following the start position. + +## Examples + +The following example returns the first three characters of every last name: ```esql FROM employees @@ -30,10 +39,10 @@ FROM employees | EVAL ln_sub = SUBSTRING(last_name, -3, 3) ``` -If length is omitted, `SUBSTRING` returns the remainder of the string. This example returns all characters except for the first: +If the length parameter is omitted, the SUBSTRING function returns the remainder of the string. This example returns all characters except for the first: ```esql FROM employees | KEEP last_name | EVAL ln_sub = SUBSTRING(last_name, 2) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sum.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sum.txt index c893aeb160d00..9e782699db9ba 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sum.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-sum.txt @@ -1,15 +1,29 @@ -## SUM +# SUM -The `SUM` function calculates the sum of a numeric expression. +The SUM function calculates the total sum of a numeric expression. -### Examples +## Syntax + +`SUM(number)` + +### Parameters + +#### number + +The numeric expression that you want to calculate the sum of. + +## Examples + +Calculate the sum of a numeric field: ```esql FROM employees | STATS SUM(languages) ``` +The SUM function can be used with inline functions: + ```esql FROM employees | STATS total_salary_changes = SUM(MV_MAX(salary_change)) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-syntax.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-syntax.txt index e1e339239a713..8259b8d8d5546 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-syntax.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-syntax.txt @@ -1,44 +1,28 @@ -## Syntax +# ES|QL Syntax Guide -### Instructions +This guide provides an overview of the ES|QL syntax and examples of its usage. -Generate a description of ES|QL syntax. Be as complete as possible. -For timespan literals, generate at least five examples of full ES|QL queries, using a mix of commands and functions, using different intervals and units. -**Make sure you use timespan literals, such as `1 day` or `24h` or `7 weeks` in these examples**. -Combine ISO timestamps with time span literals and NOW(). -Make sure the example queries are using different combinations of syntax, commands, and functions for each. -When using DATE_TRUNC, make sure you DO NOT wrap the timespan in single or double quotes. -Do not use the Cast operator. +## Basic Syntax -### Content of file +An ES|QL query is composed of a source command followed by an optional series of processing commands, separated by a pipe character (`|`). For example: -### ES|QL Syntax Reference - -#### Basic Syntax - -An ES|QL query is composed of a source command followed by an optional series of processing commands, separated by a pipe character: `|`. For example: - -``` +```esql source-command | processing-command1 | processing-command2 ``` -The result of a query is the table produced by the final processing command. For an overview of all supported commands, functions, and operators, refer to Commands and Functions and operators. +The result of a query is the table produced by the final processing command. -For readability, this documentation puts each processing command on a new line. However, you can write an ES|QL query as a single line. The following query is identical to the previous one: +For readability, each processing command is typically written on a new line. However, an ES|QL query can also be written as a single line: -``` -source-command -| processing-command1 -| processing-command2 +```esql +source-command | processing-command1 | processing-command2 ``` -#### Identifiers +## Identifiers -Identifiers need to be quoted with backticks (```) if: -- They don’t start with a letter, `_` or `@` -- Any of the other characters is not a letter, number, or `_` +Identifiers in ES|QL need to be quoted with backticks (```) if they don’t start with a letter, `_` or `@`, or if any of the other characters is not a letter, number, or `_`. For example: @@ -47,7 +31,9 @@ FROM index | KEEP `1.field` ``` -When referencing a function alias that itself uses a quoted identifier, the backticks of the quoted identifier need to be escaped with another backtick. For example: +When referencing a function alias that itself uses a quoted identifier, the backticks of the quoted identifier need to be escaped with another backtick. + +For example: ```esql FROM index @@ -55,16 +41,15 @@ FROM index | EVAL my_count = `COUNT(``1.field``)` ``` -#### Literals +## Literals ES|QL currently supports numeric and string literals. -##### String Literals +### String Literals -A string literal is a sequence of unicode characters delimited by double quotes (`"`). +A string literal is a sequence of unicode characters delimited by double quotes (`"`). For example: ```esql -// Filter by a string value FROM index | WHERE first_name == "Georgi" ``` @@ -75,11 +60,13 @@ If the literal string itself contains quotes, these need to be escaped (`\\"`). ROW name = """Indiana "Indy" Jones""" ``` -The special characters CR, LF, and TAB can be provided with the usual escaping: `\r`, `\n`, `\t`, respectively. +The special characters CR, LF and TAB can be provided with the usual escaping: `\r`, `\n`, `\t`, respectively. + +### Numerical Literals -##### Numerical Literals +The numeric literals are accepted in decimal and in the scientific notation with the exponent marker (`e` or `E`), starting either with a digit, decimal point `.` or the negative sign `-`. -The numeric literals are accepted in decimal and in the scientific notation with the exponent marker (`e` or `E`), starting either with a digit, decimal point `.` or the negative sign `-`: +For example: - `1969` -- integer notation - `3.14` -- decimal notation @@ -88,36 +75,14 @@ The numeric literals are accepted in decimal and in the scientific notation with - `1.2e-3` -- scientific notation with decimal point - `-.1e2` -- scientific notation starting with the negative sign -The integer numeric literals are implicitly converted to the `integer`, `long` or the `double` type, whichever can first accommodate the literal’s value. The floating point literals are implicitly converted to the `double` type. To obtain constant values of different types, use one of the numeric conversion functions. - -#### Comments - -ES|QL uses C++ style comments: -- Double slash `//` for single line comments -- `/*` and `*/` for block comments +The integer numeric literals are implicitly converted to the `integer`, `long` or the `double` type, whichever can first accommodate the literal’s value. -```esql -// Query the employees index -FROM employees -| WHERE height > 2 -``` +## Timespan Literals -```esql -FROM /* Query the employees index */ employees -| WHERE height > 2 -``` - -```esql -FROM employees -/* Query the - * employees - * index */ -| WHERE height > 2 -``` +Datetime intervals and timespans can be expressed using timespan literals. Timespan literals are a combination of a number and a qualifier. -#### Timespan Literals +These qualifiers are supported: -Datetime intervals and timespans can be expressed using timespan literals. Timespan literals are a combination of a number and a qualifier. These qualifiers are supported: - `millisecond`/`milliseconds`/`ms` - `second`/`seconds`/`sec`/`s` - `minute`/`minutes`/`min` @@ -129,15 +94,18 @@ Datetime intervals and timespans can be expressed using timespan literals. Times - `year`/`years`/`yr`/`y` Timespan literals are not whitespace sensitive. These expressions are all valid: -- `1day` -- `1 day` -- `1 day` -#### Example Queries Using Timespan Literals +- 1day +- 1 day +- 1 day + +## Example Queries with Timespan Literals + +Here are some example queries that use timespan literals: 1. Retrieve logs from the last 24 hours and calculate the average response time: - ```esql +```esql FROM logs-* | WHERE @timestamp > NOW() - 24h | STATS avg_response_time = AVG(response_time) @@ -145,7 +113,7 @@ FROM logs-* 2. Get the count of events per day for the last 7 days: - ```esql +```esql FROM events | WHERE @timestamp > NOW() - 7 days | STATS daily_count = COUNT(*) BY day = DATE_TRUNC(1 day, @timestamp) @@ -154,7 +122,7 @@ FROM events 3. Find the maximum temperature recorded in the last month: - ```esql +```esql FROM weather_data | WHERE @timestamp > NOW() - 1 month | STATS max_temp = MAX(temperature) @@ -162,7 +130,7 @@ FROM weather_data 4. Calculate the total sales for each week in the last quarter: - ```esql +```esql FROM sales | WHERE @timestamp > NOW() - 1 quarter | STATS weekly_sales = SUM(sales_amount) BY week = DATE_TRUNC(1 week, @timestamp) @@ -171,11 +139,34 @@ FROM sales 5. Retrieve error logs from the last 15 minutes and group by error type: - ```esql +```esql FROM error_logs | WHERE @timestamp > NOW() - 15 minutes | STATS error_count = COUNT(*) BY error_type | SORT error_count DESC ``` -These examples demonstrate the use of timespan literals in combination with various ES|QL commands and functions to perform different types of data queries and transformations. +#### Comments + +ES|QL uses C++ style comments: +- Double slash `//` for single line comments +- `/*` and `*/` for block comments + +```esql +// Query the employees index +FROM employees +| WHERE height > 2 +``` + +```esql +FROM /* Query the employees index */ employees +| WHERE height > 2 +``` + +```esql +FROM employees +/* Query the + * employees + * index */ +| WHERE height > 2 +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tan.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tan.txt index 8541f193d89a4..56940c07ba0a4 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tan.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tan.txt @@ -1,15 +1,25 @@ -## TAN +# TAN -The `TAN` function returns the Tangent trigonometric function of an angle. +The TAN function calculates the tangent of a given angle. -### Examples +## Syntax + +`TAN(angle)` + +### Parameters + +#### angle + +The angle for which the tangent is to be calculated. The angle should be in radians. If the angle is `null`, the function will return `null`. + +## Examples ```esql ROW a=1.8 -| EVAL tan = TAN(a) +| EVAL tan=TAN(a) ``` ```esql ROW angle=0.5 | EVAL tangent = TAN(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tanh.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tanh.txt index 45fc52fa501a7..311608f9afbac 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tanh.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tanh.txt @@ -1,8 +1,18 @@ -## TANH +# TANH -The `TANH` function returns the hyperbolic tangent of an angle, expressed in radians. If the input angle is null, the function returns null. +The TANH function calculates the hyperbolic tangent of a given angle. -### Examples +## Syntax + +`TANH(angle)` + +### Parameters + +#### angle + +This is the angle in radians for which you want to calculate the hyperbolic tangent. + +## Examples ```esql ROW a=1.8 @@ -12,4 +22,4 @@ ROW a=1.8 ```esql ROW angle=0.5 | EVAL hyperbolic_tangent = TANH(angle) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tau.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tau.txt index 149d1333b2a1a..c18de72c77c69 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tau.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-tau.txt @@ -1,8 +1,16 @@ -## TAU +# TAU -The `TAU` function returns the ratio of a circle’s circumference to its radius. +TAU function returns the mathematical constant τ (tau), which is the ratio of a circle's circumference to its radius. -### Examples +## Syntax + +`TAU()` + +### Parameters + +This function does not require any parameters. + +## Examples ```esql ROW TAU() @@ -12,4 +20,4 @@ ROW TAU() FROM sample_data | EVAL tau_value = TAU() | KEEP tau_value -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_boolean.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_boolean.txt index 9a243ba128fb9..2dee65a690118 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_boolean.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_boolean.txt @@ -1,8 +1,20 @@ -## TO_BOOLEAN +# TO_BOOLEAN -The `TO_BOOLEAN` function converts an input value to a boolean value. A string value of "true" will be case-insensitively converted to the Boolean `true`. For anything else, including the empty string, the function will return `false`. The numerical value of `0` will be converted to `false`, and anything else will be converted to `true`. +The TO_BOOLEAN function converts an input value into a boolean value. -### Examples +## Syntax + +`TO_BOOLEAN(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Examples + +The following example demonstrates the use of the TO_BOOLEAN function: ```esql ROW str = ["true", "TRuE", "false", "", "yes", "1"] @@ -12,4 +24,9 @@ ROW str = ["true", "TRuE", "false", "", "yes", "1"] ```esql ROW num = [0, 1, 2, -1] | EVAL bool = TO_BOOLEAN(num) -``` \ No newline at end of file +``` + +## Notes + +- A string value of `true` is case-insensitively converted to the boolean `true`. For any other value, including an empty string, the function returns `false`. +- A numerical value of `0` is converted to `false`, while any other numerical value is converted to `true`. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianpoint.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianpoint.txt index db56fd5fc67f4..edf1d2020882f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianpoint.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianpoint.txt @@ -1,8 +1,18 @@ -## TO_CARTESIANPOINT +# TO_CARTESIANPOINT -Converts an input value to a `cartesian_point` value. A string will only be successfully converted if it respects the WKT Point format. +The TO_CARTESIANPOINT function converts an input value into a `cartesian_point` value. -### Examples +## Syntax + +`TO_CARTESIANPOINT(field)` + +### Parameters + +#### field + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW wkt = ["POINT(4297.11 -1475.53)", "POINT(7580.93 2272.77)"] @@ -14,4 +24,4 @@ ROW wkt = ["POINT(4297.11 -1475.53)", "POINT(7580.93 2272.77)"] ROW wkt = ["POINT(1000.0 2000.0)", "POINT(3000.0 4000.0)"] | MV_EXPAND wkt | EVAL pt = TO_CARTESIANPOINT(wkt) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianshape.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianshape.txt index f32af0a6d4805..8fb5ad5be2579 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianshape.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_cartesianshape.txt @@ -1,8 +1,18 @@ -## TO_CARTESIANSHAPE +# TO_CARTESIANSHAPE -Converts an input value to a `cartesian_shape` value. A string will only be successfully converted if it respects the WKT format. +The TO_CARTESIANSHAPE function converts an input value into a `cartesian_shape` value. -### Examples +## Syntax + +`TO_CARTESIANSHAPE(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Examples ```esql ROW wkt = ["POINT(4297.11 -1475.53)", "POLYGON ((3339584.72 1118889.97, 4452779.63 4865942.27, 2226389.81 4865942.27, 1113194.90 2273030.92, 3339584.72 1118889.97))"] @@ -14,4 +24,9 @@ ROW wkt = ["POINT(4297.11 -1475.53)", "POLYGON ((3339584.72 1118889.97, 4452779. ROW wkt = ["POINT(1000.0 2000.0)", "POLYGON ((1000.0 2000.0, 2000.0 3000.0, 3000.0 4000.0, 1000.0 2000.0))"] | MV_EXPAND wkt | EVAL geom = TO_CARTESIANSHAPE(wkt) -``` \ No newline at end of file +``` + +## Notes + +- The input value can be a single or multi-valued column or an expression. +- The function will only successfully convert a string if it adheres to the WKT format. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_datetime.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_datetime.txt index 7e0cd1fc82c06..579765a4685f5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_datetime.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_datetime.txt @@ -1,19 +1,35 @@ -## TO_DATETIME +# TO_DATETIME -Converts an input value to a date value. A string will only be successfully converted if it’s respecting the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use `DATE_PARSE`. +The TO_DATETIME function converts an input value into a date value. -### Examples +## Syntax + +`TO_DATETIME(field)` + +### Parameters + +#### field + +The input value to be converted. This can be a single or multi-valued column or an expression. + +## Examples + +The following example converts a string into a date value: ```esql ROW string = ["1953-09-02T00:00:00.000Z", "1964-06-02T00:00:00.000Z", "1964-06-02 00:00:00"] | EVAL datetime = TO_DATETIME(string) ``` -Note that in this example, the last value in the source multi-valued field has not been converted. The reason being that if the date format is not respected, the conversion will result in a null value. When this happens a Warning header is added to the response. The header will provide information on the source of the failure: "Line 1:112: evaluation of [TO_DATETIME(string)] failed, treating result as null. Only first 20 failures recorded." A following header will contain the failure reason and the offending value: "java.lang.IllegalArgumentException: failed to parse date field [1964-06-02 00:00:00] with format [yyyy-MM-dd'T'HH:mm:ss.SSS'Z']". +If the input parameter is of a numeric type, its value will be interpreted as milliseconds since the Unix epoch. For example: ```esql ROW int = [0, 1] | EVAL dt = TO_DATETIME(int) ``` -If the input parameter is of a numeric type, its value will be interpreted as milliseconds since the Unix epoch. \ No newline at end of file +## Notes + +- TO_DATETIME converts an input value into a date value. A string will only be successfully converted if it follows the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use the `DATE_PARSE` function. + +- When converting from nanosecond resolution to millisecond resolution with this function, the nanosecond date is truncated, not rounded. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_degrees.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_degrees.txt index 287e526f143e5..a6c75fb627f73 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_degrees.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_degrees.txt @@ -1,8 +1,18 @@ -## TO_DEGREES +# TO_DEGREES -Converts a number in radians to degrees. +The TO_DEGREES function converts a numerical value from radians to degrees. -### Examples +## Syntax + +`TO_DEGREES(number)` + +### Parameters + +#### number + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW rad = [1.57, 3.14, 4.71] @@ -12,4 +22,4 @@ ROW rad = [1.57, 3.14, 4.71] ```esql ROW angle_in_radians = 1.0 | EVAL angle_in_degrees = TO_DEGREES(angle_in_radians) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_double.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_double.txt index c835ee9531281..2a802fbc1c6a0 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_double.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_double.txt @@ -1,14 +1,25 @@ -## TO_DOUBLE +# TO_DOUBLE -Converts an input value to a double value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch, converted to double. Boolean true will be converted to double 1.0, false to 0.0. +The TO_DOUBLE function converts an input value into a double value. -### Examples +## Syntax + +`TO_DOUBLE(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Examples ```esql ROW str1 = "5.20128E11", str2 = "foo" | EVAL dbl = TO_DOUBLE("520128000000"), dbl1 = TO_DOUBLE(str1), dbl2 = TO_DOUBLE(str2) ``` -Note that in this example, the last conversion of the string isn’t possible. When this happens, the result is a null value. In this case, a Warning header is added to the response. The header will provide information on the source of the failure: -"Line 1:115: evaluation of [TO_DOUBLE(str2)] failed, treating result as null. Only first 20 failures recorded." A following header will contain the failure reason and the offending value: -"java.lang.NumberFormatException: For input string: "foo"" \ No newline at end of file +## Notes + +- If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch and converted to a double. +- A boolean value of true will be converted to a double value of 1.0, and false will be converted to 0.0. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geopoint.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geopoint.txt index 27cc2b7569742..f5f57af2a1ca0 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geopoint.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geopoint.txt @@ -1,8 +1,18 @@ -## TO_GEOPOINT +# TO_GEOPOINT -Converts an input value to a `geo_point` value. A string will only be successfully converted if it respects the WKT Point format. +The TO_GEOPOINT function converts an input value into a `geo_point` value. -### Examples +## Syntax + +`TO_GEOPOINT(field)` + +### Parameters + +#### field + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW wkt = "POINT(42.97109630194 14.7552534413725)" @@ -12,4 +22,4 @@ ROW wkt = "POINT(42.97109630194 14.7552534413725)" ```esql ROW wkt = "POINT(34.052235 -118.243683)" | EVAL pt = TO_GEOPOINT(wkt) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geoshape.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geoshape.txt index 9160e081f6477..47f4baa3a8df1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geoshape.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_geoshape.txt @@ -1,8 +1,18 @@ -## TO_GEOSHAPE +# TO_GEOSHAPE -Converts an input value to a `geo_shape` value. A string will only be successfully converted if it respects the WKT format. +The TO_GEOSHAPE function converts an input value into a `geo_shape` value. -### Examples +## Syntax + +`TO_GEOSHAPE(field)` + +### Parameters + +#### field + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW wkt = "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))" @@ -12,4 +22,4 @@ ROW wkt = "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))" ```esql ROW wkt = "LINESTRING (30 10, 10 30, 40 40)" | EVAL geom = TO_GEOSHAPE(wkt) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_integer.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_integer.txt index 3944c64ee004b..9a4098ff61fa3 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_integer.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_integer.txt @@ -1,12 +1,28 @@ -## TO_INTEGER +# TO_INTEGER -Converts an input value to an integer value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch, converted to integer. Boolean true will be converted to integer 1, false to 0. +The TO_INTEGER function converts an input value into an integer. -### Examples +## Syntax + +`TO_INTEGER(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Description + +The TO_INTEGER function converts an input value into an integer. + +## Examples ```esql ROW long = [5013792, 2147483647, 501379200000] | EVAL int = TO_INTEGER(long) ``` -Note that in this example, the last value of the multi-valued field cannot be converted as an integer. When this happens, the result is a null value. In this case, a Warning header is added to the response. The header will provide information on the source of the failure: "Line 1:61: evaluation of [TO_INTEGER(long)] failed, treating result as null. Only first 20 failures recorded." A following header will contain the failure reason and the offending value: "org.elasticsearch.xpack.esql.core.InvalidArgumentException: [501379200000] out of [integer] range" \ No newline at end of file +## Notes + +- If the input parameter is of a date type, its value is interpreted as milliseconds since the Unix epoch and converted to an integer. A boolean value of true is converted to integer 1, and false is converted to 0. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_ip.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_ip.txt index af3d8b754d88e..3bffd3a4e7d0f 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_ip.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_ip.txt @@ -1,8 +1,18 @@ -## TO_IP +# TO_IP -The `TO_IP` function converts an input string to an IP value. +The TO_IP function converts an input string into an IP value. -### Examples +## Syntax + +`TO_IP(field)` + +### Parameters + +#### field + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW str1 = "1.1.1.1", str2 = "foo" @@ -10,12 +20,9 @@ ROW str1 = "1.1.1.1", str2 = "foo" | WHERE CIDR_MATCH(ip1, "1.0.0.0/8") ``` -Note that in this example, the last conversion of the string isn’t possible. When this happens, the result is a null value. In this case, a Warning header is added to the response. The header will provide information on the source of the failure: "Line 1:68: evaluation of [TO_IP(str2)] failed, treating result as null. Only first 20 failures recorded." A following header will contain the failure reason and the offending value: "java.lang.IllegalArgumentException: 'foo' is not an IP string literal." - ```esql ROW ip_str = "192.168.1.1" | EVAL ip = TO_IP(ip_str) | KEEP ip ``` -In this example, the string "192.168.1.1" is successfully converted to an IP value and kept in the result set. \ No newline at end of file diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_long.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_long.txt index b5fd5788be666..ec844c50461f5 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_long.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_long.txt @@ -1,17 +1,29 @@ -## TO_LONG +# TO_LONG -Converts an input value to a long value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch, converted to long. Boolean true will be converted to long 1, false to 0. +The TO_LONG function converts an input value into a long value. -### Examples +## Syntax + +`TO_LONG(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Examples ```esql ROW str1 = "2147483648", str2 = "2147483648.2", str3 = "foo" | EVAL long1 = TO_LONG(str1), long2 = TO_LONG(str2), long3 = TO_LONG(str3) ``` -Note that in this example, the last conversion of the string isn’t possible. When this happens, the result is a null value. In this case, a Warning header is added to the response. The header will provide information on the source of the failure: "Line 1:113: evaluation of [TO_LONG(str3)] failed, treating result as null. Only first 20 failures recorded." A following header will contain the failure reason and the offending value: "java.lang.NumberFormatException: For input string: "foo"" - ```esql ROW str1 = "1234567890", str2 = "9876543210" | EVAL long1 = TO_LONG(str1), long2 = TO_LONG(str2) -``` \ No newline at end of file +``` + +## Notes + +- If the input parameter is of a date type, its value is interpreted as milliseconds since the Unix epoch and converted to a long value. A boolean value of true is converted to a long value of 1, and false is converted to 0. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_lower.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_lower.txt index 26be9d76088df..9bc7d3caeab4a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_lower.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_lower.txt @@ -1,8 +1,18 @@ -## TO_LOWER +# TO_LOWER -The `TO_LOWER` function returns a new string representing the input string converted to lower case. +The TO_LOWER function converts the input string to lowercase. -### Examples +## Syntax + +`TO_LOWER(str)` + +### Parameters + +#### str + +The string expression that you want to convert to lowercase. + +## Examples ```esql ROW message = "Some Text" @@ -14,4 +24,4 @@ FROM employees | KEEP first_name, last_name | EVAL first_name_lower = TO_LOWER(first_name) | EVAL last_name_lower = TO_LOWER(last_name) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_radians.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_radians.txt index d0c90d3739998..6af4d409fe70d 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_radians.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_radians.txt @@ -1,8 +1,18 @@ -## TO_RADIANS +# TO_RADIANS -Converts a number in degrees to radians. +The TO_RADIANS function converts a numerical value from degrees to radians. -### Examples +## Syntax + +`TO_RADIANS(number)` + +### Parameters + +#### number + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples ```esql ROW deg = [90.0, 180.0, 270.0] @@ -12,4 +22,4 @@ ROW deg = [90.0, 180.0, 270.0] ```esql ROW angle_deg = 45.0 | EVAL angle_rad = TO_RADIANS(angle_deg) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_string.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_string.txt index b2ee6a7a7f106..9fa0a489741fe 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_string.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_string.txt @@ -1,14 +1,28 @@ -## TO_STRING +# TO_STRING -Converts an input value into a string. +The TO_STRING function converts an input value into a string. -### Examples +## Syntax + +`TO_STRING(field)` + +### Parameters + +#### field + +This is the input value. It can be a single or multi-valued column or an expression. + +## Examples + +Here is an example of how to use the TO_STRING function: ```esql ROW a=10 | EVAL j = TO_STRING(a) ``` +The TO_STRING function also works well on multi-valued fields: + ```esql ROW a=[10, 9, 8] | EVAL j = TO_STRING(a) diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_unsigned_long.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_unsigned_long.txt index 7702b6f8da228..5922a3cb741bd 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_unsigned_long.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_unsigned_long.txt @@ -1,20 +1,31 @@ -## TO_UNSIGNED_LONG +# TO_UNSIGNED_LONG -Converts an input value to an unsigned long value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch, converted to unsigned long. Boolean true will be converted to unsigned long 1, false to 0. +The TO_UNSIGNED_LONG function converts an input value into an unsigned long value. -### Examples +## Syntax + +`TO_UNSIGNED_LONG(field)` + +### Parameters + +#### field + +The input value. This can be a single or multi-valued column or an expression. + +## Examples + +The following example demonstrates the use of the TO_UNSIGNED_LONG function: ```esql ROW str1 = "2147483648", str2 = "2147483648.2", str3 = "foo" | EVAL long1 = TO_UNSIGNED_LONG(str1), long2 = TO_ULONG(str2), long3 = TO_UL(str3) ``` -Note that in this example, the last conversion of the string isn’t possible. When this happens, the result is a null value. In this case, a Warning header is added to the response. The header will provide information on the source of the failure: -"Line 1:133: evaluation of [TO_UL(str3)] failed, treating result as null. Only first 20 failures recorded." -A following header will contain the failure reason and the offending value: -"java.lang.NumberFormatException: Character f is neither a decimal digit number, decimal point, nor "e" notation exponential mark." - ```esql ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z") | EVAL long_date1 = TO_UNSIGNED_LONG(date1), long_date2 = TO_UNSIGNED_LONG(date2) -``` \ No newline at end of file +``` + +## Notes + +If the input parameter is of a date type, its value will be interpreted as milliseconds since the Unix epoch and then converted to an unsigned long. A boolean value of true will be converted to an unsigned long value of 1, and false will be converted to 0. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_upper.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_upper.txt index d563e9efa59b4..b4783943c4d98 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_upper.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_upper.txt @@ -1,8 +1,18 @@ -## TO_UPPER +# TO_UPPER -The `TO_UPPER` function returns a new string representing the input string converted to upper case. +The TO_UPPER function converts the input string to uppercase. -### Examples +## Syntax + +`TO_UPPER(str)` + +### Parameters + +#### str + +The string expression that you want to convert to uppercase. + +## Examples ```esql ROW message = "Some Text" @@ -14,4 +24,4 @@ FROM employees | KEEP first_name, last_name | EVAL first_name_upper = TO_UPPER(first_name) | EVAL last_name_upper = TO_UPPER(last_name) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_version.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_version.txt index 2fb20b0686723..447a1752622a1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_version.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-to_version.txt @@ -1,8 +1,18 @@ -## TO_VERSION +# TO_VERSION -Converts an input string to a version value. +The TO_VERSION function converts an input string into a version value. -### Examples +## Syntax + +`TO_VERSION(field)` + +### Parameters + +#### field + +The input value to be converted. This can be a single or multi-valued column or an expression. + +## Examples ```esql ROW v = TO_VERSION("1.2.3") @@ -11,4 +21,4 @@ ROW v = TO_VERSION("1.2.3") ```esql ROW version_string = "2.3.4" | EVAL version = TO_VERSION(version_string) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-top.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-top.txt index 6b18788f4b6ac..b22fb7c9b54d1 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-top.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-top.txt @@ -1,25 +1,37 @@ -## TOP +# TOP -The `TOP` function collects the top values for a specified field. It includes repeated values and allows you to specify the maximum number of values to collect and the order in which to sort them (either ascending or descending). +The TOP function collects the top values for a specified field. -### Syntax +## Syntax `TOP(field, limit, order)` ### Parameters -- **field**: The field to collect the top values for. -- **limit**: The maximum number of values to collect. -- **order**: The order to calculate the top values. Either `asc` or `desc`. +#### field -### Examples +The field for which the top values are to be collected. + +#### limit + +The maximum number of values to be collected. + +#### order + +The order in which the top values are calculated. It can be either `asc` (ascending) or `desc` (descending). + +## Examples + +Collect the top 3 salaries from the employees data: ```esql FROM employees | STATS top_salaries = TOP(salary, 3, "desc"), top_salary = MAX(salary) ``` +Collect the top 5 products in ascending order: + ```esql FROM sales | STATS top_products = TOP(product_id, 5, "asc"), max_sales = MAX(sales_amount) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-trim.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-trim.txt index bec5cce253909..ad4aa621a009a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-trim.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-trim.txt @@ -1,8 +1,18 @@ -## TRIM +# TRIM -Removes leading and trailing whitespaces from a string. +The TRIM function removes leading and trailing whitespaces from a string. -### Examples +## Syntax + +`TRIM(string)` + +### Parameters + +#### string + +This is the string expression that you want to trim. + +## Examples ```esql ROW message = " some text ", color = " red " @@ -14,4 +24,4 @@ ROW message = " some text ", color = " red " ROW text = " example text ", label = " label " | EVAL text = TRIM(text) | EVAL label = TRIM(label) -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-values.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-values.txt index a4acd2c397104..1ea45c3642e0a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-values.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-values.txt @@ -1,10 +1,24 @@ -## VALUES +# VALUES -`VALUES` returns all values in a group as a multivalued field. The order of the returned values isn’t guaranteed. If you need the values returned in order, use `MV_SORT`. +The VALUES function returns all values in a group as a multivalued field. -**Note:** Do not use `VALUES` in production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. +## Syntax -### Examples +`VALUES (field)` + +### Parameters + +#### field + +The field for which all values are to be returned. + +## Description + +The VALUES function is used to return all values in a group as a multivalued field. It's important to note that the order of the returned values is not guaranteed. If you need the values returned in a specific order, you should use the `MV_SORT` function. + +## Examples + +The following example demonstrates how to use the VALUES function: ```esql FROM employees @@ -13,4 +27,7 @@ FROM employees | SORT first_letter ``` -This can use a significant amount of memory and ES|QL doesn’t yet grow aggregations beyond memory. So this aggregation will work until it is used to collect more values than can fit into memory. Once it collects too many values it will fail the query with a Circuit Breaker Error. \ No newline at end of file +## Limitations + +- This functionality is in technical preview and may be changed or removed in a future release +- The VALUES function can consume a significant amount of memory. ES|QL does not currently support growing aggregations beyond memory. Therefore, if the function collects more values than can fit into memory, it will fail the query with a Circuit Breaker Error. diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-weighted_avg.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-weighted_avg.txt index 9030159ff728c..611edc73cb983 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-weighted_avg.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-weighted_avg.txt @@ -1,8 +1,22 @@ -## WEIGHTED_AVG +# WEIGHTED_AVG -The `WEIGHTED_AVG` function calculates the weighted average of a numeric expression. +The WEIGHTED_AVG function calculates the weighted average of a numeric expression. -### Examples +## Syntax + +`WEIGHTED_AVG(number, weight)` + +### Parameters + +#### number + +A numeric value that you want to calculate the weighted average for. + +#### weight + +A numeric value that represents the weight of the corresponding number. + +## Examples ```esql FROM employees @@ -18,4 +32,4 @@ FROM sales | EVAL weighted_sales = ROUND(weighted_sales, 2) | KEEP weighted_sales, region | SORT region -``` \ No newline at end of file +``` diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-where.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-where.txt index 7a772aed34bb3..b9b70ebad625e 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-where.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/esql_docs/esql-where.txt @@ -1,18 +1,30 @@ -## WHERE +# WHERE -The `WHERE` processing command produces a table that contains all the rows from the input table for which the provided condition evaluates to true. This command is essential for filtering data based on specific criteria, making it highly useful for narrowing down datasets to relevant records. It supports various functions and operators, including boolean expressions, date math, string patterns, and NULL comparisons. +The WHERE command filters the rows in a table based on a specified condition, returning only those rows where the condition evaluates to `true`. -### Use Cases -- Filtering records based on boolean conditions. -- Retrieving data within a specific time range using date math. -- Filtering data based on string patterns using wildcards or regular expressions. -- Performing NULL comparisons. -- Testing whether a field or expression equals an element in a list of literals, fields, or expressions. +## Syntax -### Limitations -- The `WHERE` command is subject to the limitations of ES|QL, such as the maximum number of rows returned (10,000) and the types of fields supported. +`WHERE expression` + +### Parameters + +#### expression + +A boolean expression that defines the condition for filtering the rows. + +## Notes + +WHERE supports the following types of functions: +- Mathematical functions +- String functions +- Date-time functions +- Type conversation functions +- Conditional functions and expressions +- Multi-value functions -### Examples +Aggregation functions are WHERE supported for EVAL. + +## Examples #### Example 1: Filtering Based on Boolean Condition ```esql @@ -20,7 +32,9 @@ FROM employees | KEEP first_name, last_name, still_hired | WHERE still_hired == true ``` + If `still_hired` is a boolean field, this can be simplified to: + ```esql FROM employees | KEEP first_name, last_name, still_hired @@ -48,6 +62,7 @@ FROM employees | SORT first_name | LIMIT 3 ``` + ```esql FROM employees | WHERE is_rehired IS NOT NULL @@ -74,4 +89,5 @@ ROW a = 1, b = 4, c = 3 | WHERE c-a IN (3, b / 2, a) ``` -For a complete list of all functions and operators, refer to the Functions overview and Operators documentation. \ No newline at end of file +### Limitations +- The `WHERE` command is subject to the maximum number of rows limitation (10,000). diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/index.ts b/x-pack/plugins/inference/server/tasks/nl_to_esql/index.ts index 92f36c1ccef89..2fcc204a9f47a 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/index.ts +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/index.ts @@ -6,7 +6,7 @@ */ import type { Logger } from '@kbn/logging'; -import { isEmpty, mapValues, pick } from 'lodash'; +import { isEmpty, has } from 'lodash'; import { Observable, from, map, merge, of, switchMap } from 'rxjs'; import { ToolSchema, generateFakeToolCallId, isChatCompletionMessageEvent } from '../../../common'; import { @@ -86,7 +86,21 @@ export function naturalLanguageToEsql({ 'OPERATORS', ].map((keyword) => keyword.toUpperCase()); - const requestedDocumentation = mapValues(pick(esqlDocs, keywords), ({ data }) => data); + const requestedDocumentation = keywords.reduce>( + (documentation, keyword) => { + if (has(esqlDocs, keyword)) { + documentation[keyword] = esqlDocs[keyword].data; + } else { + documentation[keyword] = ` + ## ${keyword} + + There is no ${keyword} function or command in ES|QL. Do NOT try to use it. + `; + } + return documentation; + }, + {} + ); const fakeRequestDocsToolCall = { function: { diff --git a/x-pack/plugins/inference/server/tasks/nl_to_esql/system_message.txt b/x-pack/plugins/inference/server/tasks/nl_to_esql/system_message.txt index b5d333296c696..2efa08a6288c0 100644 --- a/x-pack/plugins/inference/server/tasks/nl_to_esql/system_message.txt +++ b/x-pack/plugins/inference/server/tasks/nl_to_esql/system_message.txt @@ -1,88 +1,67 @@ -# System instructions - You are a helpful assistant for generating and executing ES|QL queries. -Your goal is to help the user construct and possibly execute an ES|QL -query for their data. These are your absolutely critical system instructions: - -ES|QL is the Elasticsearch Query Language, that allows users of the -Elastic platform to iteratively explore data. An ES|QL query consists -of a series of commands, separated by pipes. Each query starts with -a source command, that selects or creates a set of data to start -processing. This source command is then followed by one or more -processing commands, which can transform the data returned by the -previous command. +Your goal is to help the user construct an ES|QL query for their data. -Make sure you write a query using ONLY commands specified in this -conversation and present in the documentation. +VERY IMPORTANT: When writing ES|QL queries, make sure to ONLY use commands, functions +and operators listed in the current documentation. # Limitations -ES|QL currently does not support pagination. +- ES|QL currently does not support pagination. +- A query will never return more than 10000 rows. # Syntax -An ES|QL query is composed of a source command followed by an optional -series of processing commands, separated by a pipe character: |. For -example: +An ES|QL query is composed of a source command followed by a series +of processing commands, separated by a pipe character: |. + +For example: | | -Binary operators: ==, !=, <, <=, >, >=. -Logical operators are supported: AND, OR, NOT -Predicates: IS NULL, IS NOT NULL -Timestamp literal syntax: NOW() - 15 days, 24 hours, 1 week - ## Source commands -Source commands select a data source. There are three source commands: -- FROM: selects an index -- ROW: creates data from the command -- SHOW: returns information about the deployment +Source commands select a data source. + +There are three source commands: +- FROM: Selects one or multiple indices, data streams or aliases to use as source. +- ROW: Produces a row with one or more columns with values that you specify. +- SHOW: returns information about the deployment. ## Processing commands ES|QL processing commands change an input table by adding, removing, or -changing rows and columns. The following commands are available: +changing rows and columns. + +The following processing commands are available: -- DISSECT: extracts structured data out of a string, using a dissect -pattern. +- DISSECT: extracts structured data out of a string, using a dissect pattern - DROP: drops one or more columns - ENRICH: adds data from existing indices as new columns -- EVAL: adds a new column with calculated values. Supported functions for - EVAL are: - - Mathematical functions - - String functions - - Date-time functions - - Type conversation functions - - Conditional functions and expressions - - Multi-value functions -Aggregation functions are not supported for EVAL. +- EVAL: adds a new column with calculated values, using various type of functions - GROK: extracts structured data out of a string, using a grok pattern - KEEP: keeps one or more columns, drop the ones that are not kept - only the columns in the KEEP command can be used after a KEEP command -- LIMIT: returns the first n number of rows. The maximum value for this -is 10000. +- LIMIT: returns the first n number of rows. The maximum value for this is 10000 - MV_EXPAND: expands multi-value columns into a single row per value - RENAME: renames a column - STATS ... BY: groups rows according to a common value and calculates one or more aggregated values over the grouped rows. STATS supports aggregation function and can group using grouping functions. - SORT: sorts the row in a table by a column. Expressions are not supported. - If SORT is used right after a KEEP command, make sure it only uses column names in KEEP, - or move the SORT before the KEEP (e.g. not correct: KEEP date | SORT @timestamp, correct: SORT @timestamp | KEEP date) -- WHERE: produces a table that contains all the rows from the input table - for which the provided condition returns true. WHERE supports the same - functions as EVAL. +- WHERE: Filters rows based on a boolean condition. WHERE supports the same functions as EVAL. ## Functions and operators ### Grouping functions +The STATS ... BY command supports these grouping functions: + BUCKET: Creates groups of values out of a datetime or numeric input. ### Aggregation functions +The STATS ... BY command supports these aggregation functions: + AVG COUNT COUNT_DISTINCT @@ -97,6 +76,24 @@ TOP VALUES WEIGHTED_AVG +### Conditional functions and expressions + +Conditional functions return one of their arguments by evaluating in an if-else manner + +CASE +COALESCE +GREATEST +LEAST + +### Date-time functions + +DATE_DIFF +DATE_EXTRACT +DATE_FORMAT +DATE_PARSE +DATE_TRUNC +NOW + ### Mathematical functions ABS @@ -124,27 +121,24 @@ TAU ### String functions CONCAT +ENDS_WITH +FROM_BASE64 LEFT LENGTH +LOCATE LTRIM +REPEAT REPLACE RIGHT RTRIM SPLIT +STARTS_WITH SUBSTRING +TO_BASE64 TO_LOWER TO_UPPER TRIM -### Date-time functions - -DATE_DIFF -DATE_EXTRACT -DATE_FORMAT -DATE_PARSE -DATE_TRUNC -NOW - ### Type conversion functions TO_BOOLEAN @@ -163,16 +157,14 @@ TO_STRING TO_UNSIGNED_LONG TO_VERSION +### IP Functions -### Conditional functions and expressions - -CASE -COALESCE -GREATEST -LEAST +CIDR_MATCH +IP_PREFIX ### Multivalue functions +MV_APPEND MV_AVG MV_CONCAT MV_COUNT @@ -182,36 +174,37 @@ MV_LAST MV_MAX MV_MEDIAN MV_MIN +NV_SORT +MV_SLIDE MV_SUM +MV_ZIP ### Operators -Binary operators -Unary operators -Logical operators -IS NULL and IS NOT NULL predicates -CIDR_MATCH -ENDS_WITH +Binary operators: ==, !=, <, <=, >, >=, +, -, *, /, % +Logical operators: AND, OR, NOT +Predicates: IS NULL, IS NOT NULL +Unary operators: - + IN -LIKE -RLIKE -STARTS_WITH +LIKE: filter data based on string patterns using wildcards +RLIKE: filter data based on string patterns using regular expressions # Usage examples -Here are some examples of queries: +Here are some examples of ES|QL queries: ```esql FROM employees - | WHERE country == "NL" AND gender == "M" - | STATS COUNT(*) +| WHERE country == "NL" AND gender == "M" +| STATS COUNT(*) ``` ```esql FROM employees - | EVAL trunk_worked_seconds = avg_worked_seconds / 100000000 * 100000000 - | STATS c = count(languages.long) BY languages.long, trunk_worked_seconds - | SORT c desc, languages.long, trunk_worked_seconds +| EVAL trunk_worked_seconds = avg_worked_seconds / 100000000 * 100000000 +| STATS c = count(languages.long) BY languages.long, trunk_worked_seconds +| SORT c desc, languages.long, trunk_worked_seconds ``` *Extracting structured data from logs using DISSECT* @@ -246,11 +239,6 @@ FROM employees | SORT b ``` -**Creating inline data using ROW** -```esql -ROW a = 1, b = "two", c = null -``` - ```esql FROM employees | EVAL is_recent_hire = CASE(hire_date <= "2023-01-01T00:00:00Z", 1, 0)