-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for 8.15
#188168
Conversation
x-pack/plugins/elastic_assistant/server/routes/chat/chat_complete_route.ts
Outdated
Show resolved
Hide resolved
…Generation (#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with #188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details>
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
⏳ Build in-progress, with failures
Failed CI StepsHistory
cc @spong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you Garrett for leading this milestone of the KB automation to the 🚀
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
…ngGraph code paths for `8.15` (elastic#188168) ## Summary This PR enables the automatic setup of the Knowledge Base and LangGraph code paths for the `8.15` release. These features were behind the `assistantKnowledgeBaseByDefault` feature flag, which will remain as a gate for upcoming Knowledge Base features that were not ready for this release. As part of these changes, we now only support the new LangGraph code path, and so were able to clean up the non-kb and non-RAGonAlerts code paths. All paths within the `post_actions_executor` route funnel to the LangGraph implementation. > [!NOTE] > We were planning to do the switch to the new [`chat/completions`](https://github.com/elastic/kibana/pull/184485/files) public API, however this would've required additional refactoring since the API's slightly differ. We will make this change and delete the `post_actions_executor` route for the next release. ### Checklist Delete any items that are not applicable to this PR. - [X] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - Working with docs team to ensure updates before merging, cc @benironside - [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co> (cherry picked from commit 661c251) # Conflicts: # .buildkite/ftr_configs.yml
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… and LangGraph code paths for `8.15` (#188168) (#188605) # Backport This will backport the following commits from `main` to `8.15`: - [[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for `8.15` (#188168)](#188168) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Garrett Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-07-17T21:44:24Z","message":"[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the automatic setup of the Knowledge Base and LangGraph\r\ncode paths for the `8.15` release. These features were behind the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain as a\r\ngate for upcoming Knowledge Base features that were not ready for this\r\nrelease.\r\n\r\nAs part of these changes, we now only support the new LangGraph code\r\npath, and so were able to clean up the non-kb and non-RAGonAlerts code\r\npaths. All paths within the `post_actions_executor` route funnel to the\r\nLangGraph implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch to the new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic API, however this would've required additional refactoring since\r\nthe API's slightly differ. We will make this change and delete the\r\n`post_actions_executor` route for the next release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [X] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- Working with docs team to ensure updates before merging, cc\r\n@benironside\r\n- [X] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic <stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Security Assistant","Team:Security Generative AI","v8.15.0","v8.16.0"],"number":188168,"url":"https://github.com/elastic/kibana/pull/188168","mergeCommit":{"message":"[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the automatic setup of the Knowledge Base and LangGraph\r\ncode paths for the `8.15` release. These features were behind the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain as a\r\ngate for upcoming Knowledge Base features that were not ready for this\r\nrelease.\r\n\r\nAs part of these changes, we now only support the new LangGraph code\r\npath, and so were able to clean up the non-kb and non-RAGonAlerts code\r\npaths. All paths within the `post_actions_executor` route funnel to the\r\nLangGraph implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch to the new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic API, however this would've required additional refactoring since\r\nthe API's slightly differ. We will make this change and delete the\r\n`post_actions_executor` route for the next release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [X] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- Working with docs team to ensure updates before merging, cc\r\n@benironside\r\n- [X] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic <stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e"}},"sourceBranch":"main","suggestedTargetBranches":["8.15"],"targetPullRequestStates":[{"branch":"8.15","label":"v8.15.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188168","number":188168,"mergeCommit":{"message":"[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the automatic setup of the Knowledge Base and LangGraph\r\ncode paths for the `8.15` release. These features were behind the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain as a\r\ngate for upcoming Knowledge Base features that were not ready for this\r\nrelease.\r\n\r\nAs part of these changes, we now only support the new LangGraph code\r\npath, and so were able to clean up the non-kb and non-RAGonAlerts code\r\npaths. All paths within the `post_actions_executor` route funnel to the\r\nLangGraph implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch to the new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic API, however this would've required additional refactoring since\r\nthe API's slightly differ. We will make this change and delete the\r\n`post_actions_executor` route for the next release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [X] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- Working with docs team to ensure updates before merging, cc\r\n@benironside\r\n- [X] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic <stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e"}}]}] BACKPORT-->
## Summary In #188168 we cleaned up some of our API tests, but missed these other references and so have failures on the [periodic test pipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854). This PR updates these configs to remove the test commands that no longer exist.
## Summary In elastic#188168 we cleaned up some of our API tests, but missed these other references and so have failures on the [periodic test pipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854). This PR updates these configs to remove the test commands that no longer exist. (cherry picked from commit 40b966c)
# Backport This will backport the following commits from `main` to `8.15`: - [[Security Assistant] Cleanup MKI test configs (#188665)](#188665) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Garrett Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-07-18T16:09:22Z","message":"[Security Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn #188168 we cleaned up some of\r\nour API tests, but missed these other references and so have failures on\r\nthe [periodic test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis PR updates these configs to remove the test commands that no longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Security Generative AI","v8.15.0","v8.16.0"],"title":"[Security Assistant] Cleanup MKI test configs","number":188665,"url":"https://github.com/elastic/kibana/pull/188665","mergeCommit":{"message":"[Security Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn #188168 we cleaned up some of\r\nour API tests, but missed these other references and so have failures on\r\nthe [periodic test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis PR updates these configs to remove the test commands that no longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c"}},"sourceBranch":"main","suggestedTargetBranches":["8.15"],"targetPullRequestStates":[{"branch":"8.15","label":"v8.15.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188665","number":188665,"mergeCommit":{"message":"[Security Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn #188168 we cleaned up some of\r\nour API tests, but missed these other references and so have failures on\r\nthe [periodic test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis PR updates these configs to remove the test commands that no longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c"}}]}] BACKPORT--> Co-authored-by: Garrett Spong <spong@users.noreply.github.com>
Summary
This PR enables the automatic setup of the Knowledge Base and LangGraph code paths for the
8.15
release. These features were behind theassistantKnowledgeBaseByDefault
feature flag, which will remain as a gate for upcoming Knowledge Base features that were not ready for this release.As part of these changes, we now only support the new LangGraph code path, and so were able to clean up the non-kb and non-RAGonAlerts code paths. All paths within the
post_actions_executor
route funnel to the LangGraph implementation.Note
We were planning to do the switch to the new
chat/completions
public API, however this would've required additional refactoring since the API's slightly differ. We will make this change and delete thepost_actions_executor
route for the next release.Checklist
Delete any items that are not applicable to this PR.