-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188492
Conversation
…assistantKnowledgeBaseByDefault feature flag
A documentation preview will be available soon. Request a new doc build by commenting
If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here. |
⏳ Build in-progress
cc @spong |
⏳ Build in-progress
History
cc @spong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @spong 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> <p> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </p> </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> <p> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </p> </details> (cherry picked from commit 6137f81)
Manual backport to |
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Manually backported and merged into |
Summary
This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here.
Note
Since we're enabling the new
kbDataClient
with #188168 for8.15
, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to8.15
.