[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492

spong · 2024-07-16T22:13:54Z

Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here.

Note

Since we're enabling the new kbDataClient with #188168 for 8.15, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to 8.15.

…assistantKnowledgeBaseByDefault feature flag

github-actions · 2024-07-16T22:14:10Z

A documentation preview will be available soon.

🔨 Buildkite builds
📚 HTML diff
📙 Preview page

Request a new doc build by commenting

Rebuild this PR: run docs-build
Rebuild this PR and all Elastic docs: run docs-build rebuild

_{run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.}

_{If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.}

elasticmachine · 2024-07-16T23:04:52Z

⏳ Build in-progress

cc @spong

elasticmachine · 2024-07-17T17:02:46Z

⏳ Build in-progress

History

💚 Build #222136 succeeded fbaa71d
💚 Build #221933 succeeded 7040a90

cc @spong

patrykkopycinski

Thank you @spong 🙇

YulNaumenko

LGTM

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

kibanamachine · 2024-07-17T19:31:50Z

💔 All backports failed

Status	Branch	Result
❌	8.15	Could not create pull request: Validation Failed: {"resource":"Issue","code":"custom","field":"body","message":"body is too long (maximum is 65536 characters)"}

Manual backport

To create the backport manually run:

node scripts/backport --pr 188492

Questions ?

Please refer to the Backport tool documentation

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

spong · 2024-07-17T20:54:54Z

Manual backport to 8.15 here: #188599

…Generation (#188599) ## Summary Manual backport of #188492, as I accidentally performed some 🪄 ✨ voodoo to the PR/commit description and both the Kibanamachine bot and manual backports were unsuccessful.

kibanamachine · 2024-07-19T21:49:03Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 188492 locally

kibanamachine · 2024-07-23T21:48:47Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 188492 locally

spong · 2024-07-24T16:51:11Z

Manually backported and merged into 8.15 in #188599. Removing backport missing and 8.15 labels to prevent further failed automation checks.

spong added 2 commits July 16, 2024 15:51

Updates esql example queries

0da7b08

Increasing ci:cloud-deploy ml node for ELSER deployment, and enables …

3f1dd2e

…assistantKnowledgeBaseByDefault feature flag

spong added release_note:enhancement Feature:Security Assistant Security Assistant Team:Security Generative AI Security Generative AI v8.15.0 v8.16.0 labels Jul 16, 2024

spong self-assigned this Jul 16, 2024

spong requested review from a team as code owners July 16, 2024 22:13

spong added the ci:cloud-deploy Create or update a Cloud deployment label Jul 16, 2024

Merge branch 'main' into default-esql-examples

7040a90

spong added 2 commits July 17, 2024 09:27

Revert feature flag

1b047e1

Revert ML node memory increase

fbaa71d

spong added ci:cloud-redeploy Always create a new Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely and removed ci:cloud-deploy Create or update a Cloud deployment labels Jul 17, 2024

Merge branch 'main' into default-esql-examples

06e154a

spong removed ci:cloud-redeploy Always create a new Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely labels Jul 17, 2024

spong removed the request for review from a team July 17, 2024 18:27

patrykkopycinski approved these changes Jul 17, 2024

View reviewed changes

YulNaumenko approved these changes Jul 17, 2024

View reviewed changes

spong merged commit 6137f81 into elastic:main Jul 17, 2024
40 checks passed

spong deleted the default-esql-examples branch July 17, 2024 19:25

spong mentioned this pull request Jul 17, 2024

[8.15] [Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188599

Merged

kibanamachine mentioned this pull request Jul 17, 2024

Remove API doc workflow, add codeowners #188169

Merged

kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jul 19, 2024

spong removed backport missing Added to PRs automatically when the are determined to be missing a backport. v8.15.0 labels Jul 24, 2024

kibanamachine added backport:skip This commit does not require backporting labels Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492

spong commented Jul 16, 2024 •

edited by mistic

Loading

github-actions bot commented Jul 16, 2024

elasticmachine commented Jul 16, 2024

elasticmachine commented Jul 17, 2024

patrykkopycinski left a comment

YulNaumenko left a comment

kibanamachine commented Jul 17, 2024

spong commented Jul 17, 2024

kibanamachine commented Jul 19, 2024

kibanamachine commented Jul 23, 2024

spong commented Jul 24, 2024

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492

Conversation

spong commented Jul 16, 2024 • edited by mistic Loading

Summary

github-actions bot commented Jul 16, 2024

elasticmachine commented Jul 16, 2024

⏳ Build in-progress

elasticmachine commented Jul 17, 2024

⏳ Build in-progress

History

patrykkopycinski left a comment

Choose a reason for hiding this comment

YulNaumenko left a comment

Choose a reason for hiding this comment

kibanamachine commented Jul 17, 2024

💔 All backports failed

Manual backport

Questions ?

spong commented Jul 17, 2024

kibanamachine commented Jul 19, 2024

kibanamachine commented Jul 23, 2024

spong commented Jul 24, 2024

spong commented Jul 16, 2024 •

edited by mistic

Loading