Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Assistant] Enables automatic setup of Knowledge Base and LangGraph code paths for 8.15 #188168

Merged
merged 26 commits into from
Jul 17, 2024

Conversation

spong
Copy link
Member

@spong spong commented Jul 11, 2024

Summary

This PR enables the automatic setup of the Knowledge Base and LangGraph code paths for the 8.15 release. These features were behind the assistantKnowledgeBaseByDefault feature flag, which will remain as a gate for upcoming Knowledge Base features that were not ready for this release.

As part of these changes, we now only support the new LangGraph code path, and so were able to clean up the non-kb and non-RAGonAlerts code paths. All paths within the post_actions_executor route funnel to the LangGraph implementation.

Note

We were planning to do the switch to the new chat/completions public API, however this would've required additional refactoring since the API's slightly differ. We will make this change and delete the post_actions_executor route for the next release.

Checklist

Delete any items that are not applicable to this PR.

@spong spong added release_note:skip Skip the PR/issue when compiling release notes Feature:Security Assistant Security Assistant Team:Security Generative AI Security Generative AI v8.15.0 v8.16.0 labels Jul 11, 2024
@spong spong self-assigned this Jul 11, 2024
@spong spong requested review from a team as code owners July 11, 2024 23:26
@spong spong requested a review from a team as a code owner July 11, 2024 23:28
@spong spong added the ci:cloud-redeploy Always create a new Cloud deployment label Jul 11, 2024
spong added a commit that referenced this pull request Jul 17, 2024
…Generation (#188492)

## Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query
Generation tool as provided by @jamesspi. The number of examples have
stayed the same, as have the file names -- so I've only updated the raw
content here.

> [!NOTE]
> Since we're enabling the new `kbDataClient` with
#188168 for `8.15`, there is no
need for a delete/re-install for pre-existing deployments to use these
new example queries, as the Knowledge Base will be rebuilt on an upgrade
to `8.15`.




Token length changes as calculated using the [GPT-4
Tokenizer](https://platform.openai.com/tokenizer):


<details><summary>Existing Example Queries / Tokens: 1,108 / Characters:
4151</summary>
<p>

``` 
[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12",
"192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) by user.name, host.name
| ENRICH ldap_lookup_new ON user.name
| WHERE group.name IS NOT NULL
| EVAL follow_up = CASE(
    destcount >= 100, "true",
     "false")
| SORT destcount desc
| KEEP destcount, host.name, user.name, group.name, follow_up
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| grok dns.question.name
"%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}"
| stats unique_queries = count_distinct(dns.question.name) by
dns.question.registered_domain, process.name
| where unique_queries > 5
| sort unique_queries desc
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.code is not null
| stats event_code_count = count(event.code) by event.code,host.name
| enrich win_events on event.code with EVENT_DESCRIPTION
| where EVENT_DESCRIPTION is not null and host.name is not null
| rename EVENT_DESCRIPTION as event.description
| sort event_code_count desc
| keep event_code_count,event.code,host.name,event.description
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.category == "file" and event.action == "creation"
| stats filecount = count(file.name) by process.name,host.name
| dissect process.name "%{process}.%{extension}"
| eval proclength = length(process.name)
| where proclength > 10
| sort filecount,proclength desc
| limit 10
| keep
host.name,process.name,filecount,process,extension,fullproc,proclength
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where process.name == "curl.exe"
| stats bytes = sum(destination.bytes) by destination.address
| eval kb =  bytes/1024
| sort kb desc
| limit 10
| keep kb,destination.address
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metrics-apm*
| WHERE metricset.name == "transaction" AND metricset.interval == "1m"
| EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50,
<start-date>, <end-date>)
| STATS avg_duration = AVG(transaction.duration.histogram) BY bucket
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM packetbeat-*
| STATS doc_count = COUNT(destination.domain) BY destination.domain
| SORT doc_count DESC
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM employees
| EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy")
| SORT hire_date
| KEEP emp_no, hire_date_formatted
| LIMIT 5
```

[[esql-example-queries]]

The following is NOT an example of an ES|QL query:

```
Pagination is not supported
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE @timestamp >= NOW() - 15 minutes
| EVAL bucket = DATE_TRUNC(1 minute, @timestamp)
| STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM traces-apm*
| WHERE @timestamp >= NOW() - 24 hours
| EVAL successful = CASE(event.outcome == "success", 1, 0),
  failed = CASE(event.outcome == "failure", 1, 0)
| STATS success_rate = AVG(successful),
  avg_duration = AVG(transaction.duration),
  total_requests = COUNT(transaction.id) BY service.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metricbeat*
| EVAL cpu_pct_normalized = (system.cpu.user.pct +
system.cpu.system.pct) / system.cpu.cores
| STATS AVG(cpu_pct_normalized) BY host.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM postgres-logs
| DISSECT message "%{} duration: %{query_duration} ms"
| EVAL query_duration_num = TO_DOUBLE(query_duration)
| STATS avg_duration = AVG(query_duration_num)
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM nyc_taxis
| WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND
DATE_EXTRACT(drop_off_time, "hour") < 10
| LIMIT 10
```

```
</p>
</details> 

<details><summary>8.15 Example Queries / Tokens: 4,847 /
Characters:16671</summary>
<p>

``` 
// 1. regex to extract from dns.question.registered_domain
// Helpful when asking how to use GROK to extract values via REGEX
from logs-*
| where dns.question.name like "?*"
| grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" 
| keep dns_registered_domain
| limit 10 

// 2. hunting scheduled task with suspicious actions via registry.data.bytes
// Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity
from logs-* 
| where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and 
  registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*"""
| eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "")
| eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "")
| where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*"
| keep scheduled_task_action, registry.path, agent.id
| stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 

// 3. suspicious powershell cmds from base64 encoded cmdline
// Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands)
from logs-*
| where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*"
| keep agent.id, process.command_line
| grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))"""
| where base64_data is not null
| eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "")
| where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" 
| keep agent.id, process.command_line, decoded_base64_cmdline

//4. Detect masquerading attempts as native Windows binaries
//MITRE Tactics: "Defense Evasion"
from logs-*
| where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe"
| keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id
| eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null)
| stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id 
| where count_system_bin >= 1 and count_non_system_bin >= 1

//5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries
// Helpful when asking how to use ENRICH query results with enrich policies
from logs-* 
| where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and 
 not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" 
| keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id
| ENRICH libs-policy-defend 
| where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") 
| eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), 
  dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") 
| stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 
| sort host_count asc

//6. Potential Exfiltration by process total egress bytes
// Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping
//MITRE Tactics: "Command and Control", "Exfiltration"
from logs-*
| where host.os.family == "windows" and event.category == "network" and 
  event.action == "disconnect_received" and 
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.executable, process.entity_id
| stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable
 /* more than 1GB out by same process.pid in 8 hours */
| where total_bytes_out >= 1073741824

//7. Windows logon activity by source IP
// Helpful when answering questions about the CASE command (as well as conditional outputs/if statements)
//MITRE Tactics: "Credential Access"
from logs-*
| where host.os.family == "windows" and 
  event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and 
  source.ip is not null and 
  /* noisy failure status codes often associated to authentication misconfiguration */ 
  not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192"))
| eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null)
| stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip
 /* below threshold should be adjusted to your env logon patterns */
| where count_failed >= 100 and count_success <= 10 and count_user >= 20

//8. High count of network connection over extended period by process
//Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations
//MITRE Tactics:  "Command and Control"
from logs-* 
| where host.os.family == "windows" and event.category == "network" and 
  network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*")  and not user.id in ("S-1-5-19", "S-1-5-20") and 
/* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ 
not (process.name == "svchost.exe" and user.id == "S-1-5-18") and 
/* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 1 and number_of_con_per_hour >= 120

//9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence
//Helpful when answering questions on concatenating fields, dealing with time based searches
//MITRE Tactics: "Persistence"
from logs-*
| where  @timestamp > now() - 7 day
| where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and 
  (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null
| eval args = MV_CONCAT(Persistence.args, ",")
 /* normalizing users home profile */
| eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/")
| stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args
| where starts_with(args, "/") and agents == 1 and total == 1

//10. Suspicious Network Connections by unsigned macO
//Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions
//MITRE Tactics: "Command and Control"
from logs-*
| where host.os.family == "macos" and event.category == "network" and 
  (process.code_signature.exists == false or process.code_signature.trusted != true) and 
  /* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 8 and number_of_con_per_hour >= 120

//11. Unusual file creations by web server user
//Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations
FROM logs-*
| WHERE @timestamp > NOW() - 50 day
| WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and (
  file.path like "/var/www/*" or
  file.path like "/var/tmp/*" or
  file.path like "/tmp/*" or
  file.path like "/dev/shm/*"
)
| STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name
// Alter this threshold to make sense for your environment 
| WHERE file_count <= 5
| SORT file_count asc
| LIMIT 100


//12. Segmentation Fault & Potential Buffer Overflow Hunting
//Helpful when answering questions on extractions with GROK
FROM logs-*
| WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*"
| GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]"
| KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset


//13. Persistence via Systemd (timers)
//Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations
FROM logs-*
| WHERE host.os.type == "linux" and event.type in ("creation", "change") and (

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*"

) and not (
    process.name in (
      "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon",
      "netplan", "systemd", "generate"
    ) or
    process.executable == "/proc/self/exe" or
    process.executable like "/dev/fd/*" or
    file.extension in ("dpkg-remove", "swx", "swp")
)
| EVAL persistence = CASE(

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*",
    process.name,
    null
)
| STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name
| WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3
| SORT cc asc
| LIMIT 100

//14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs
//Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats
from logs-*
| where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData"
and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole"
| dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}"
| dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}"
| dissect user.id "%{principal_id}:%{session_name}"
| keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name
| stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name
| where instance_counts < 5
| sort instance_counts desc
```
</p>
</details>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jul 17, 2024
…Generation (elastic#188492)

## Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query
Generation tool as provided by @jamesspi. The number of examples have
stayed the same, as have the file names -- so I've only updated the raw
content here.

> [!NOTE]
> Since we're enabling the new `kbDataClient` with
elastic#188168 for `8.15`, there is no
need for a delete/re-install for pre-existing deployments to use these
new example queries, as the Knowledge Base will be rebuilt on an upgrade
to `8.15`.

Token length changes as calculated using the [GPT-4
Tokenizer](https://platform.openai.com/tokenizer):

<details><summary>Existing Example Queries / Tokens: 1,108 / Characters:
4151</summary>
<p>

```
[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12",
"192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) by user.name, host.name
| ENRICH ldap_lookup_new ON user.name
| WHERE group.name IS NOT NULL
| EVAL follow_up = CASE(
    destcount >= 100, "true",
     "false")
| SORT destcount desc
| KEEP destcount, host.name, user.name, group.name, follow_up
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| grok dns.question.name
"%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}"
| stats unique_queries = count_distinct(dns.question.name) by
dns.question.registered_domain, process.name
| where unique_queries > 5
| sort unique_queries desc
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.code is not null
| stats event_code_count = count(event.code) by event.code,host.name
| enrich win_events on event.code with EVENT_DESCRIPTION
| where EVENT_DESCRIPTION is not null and host.name is not null
| rename EVENT_DESCRIPTION as event.description
| sort event_code_count desc
| keep event_code_count,event.code,host.name,event.description
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.category == "file" and event.action == "creation"
| stats filecount = count(file.name) by process.name,host.name
| dissect process.name "%{process}.%{extension}"
| eval proclength = length(process.name)
| where proclength > 10
| sort filecount,proclength desc
| limit 10
| keep
host.name,process.name,filecount,process,extension,fullproc,proclength
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where process.name == "curl.exe"
| stats bytes = sum(destination.bytes) by destination.address
| eval kb =  bytes/1024
| sort kb desc
| limit 10
| keep kb,destination.address
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metrics-apm*
| WHERE metricset.name == "transaction" AND metricset.interval == "1m"
| EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50,
<start-date>, <end-date>)
| STATS avg_duration = AVG(transaction.duration.histogram) BY bucket
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM packetbeat-*
| STATS doc_count = COUNT(destination.domain) BY destination.domain
| SORT doc_count DESC
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM employees
| EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy")
| SORT hire_date
| KEEP emp_no, hire_date_formatted
| LIMIT 5
```

[[esql-example-queries]]

The following is NOT an example of an ES|QL query:

```
Pagination is not supported
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE @timestamp >= NOW() - 15 minutes
| EVAL bucket = DATE_TRUNC(1 minute, @timestamp)
| STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM traces-apm*
| WHERE @timestamp >= NOW() - 24 hours
| EVAL successful = CASE(event.outcome == "success", 1, 0),
  failed = CASE(event.outcome == "failure", 1, 0)
| STATS success_rate = AVG(successful),
  avg_duration = AVG(transaction.duration),
  total_requests = COUNT(transaction.id) BY service.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metricbeat*
| EVAL cpu_pct_normalized = (system.cpu.user.pct +
system.cpu.system.pct) / system.cpu.cores
| STATS AVG(cpu_pct_normalized) BY host.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM postgres-logs
| DISSECT message "%{} duration: %{query_duration} ms"
| EVAL query_duration_num = TO_DOUBLE(query_duration)
| STATS avg_duration = AVG(query_duration_num)
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM nyc_taxis
| WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND
DATE_EXTRACT(drop_off_time, "hour") < 10
| LIMIT 10
```

```
</p>
</details>

<details><summary>8.15 Example Queries / Tokens: 4,847 /
Characters:16671</summary>
<p>

```
// 1. regex to extract from dns.question.registered_domain
// Helpful when asking how to use GROK to extract values via REGEX
from logs-*
| where dns.question.name like "?*"
| grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)"""
| keep dns_registered_domain
| limit 10

// 2. hunting scheduled task with suspicious actions via registry.data.bytes
// Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity
from logs-*
| where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and
  registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*"""
| eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "")
| eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "")
| where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*"
| keep scheduled_task_action, registry.path, agent.id
| stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1

// 3. suspicious powershell cmds from base64 encoded cmdline
// Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands)
from logs-*
| where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*"
| keep agent.id, process.command_line
| grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))"""
| where base64_data is not null
| eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "")
| where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*"""
| keep agent.id, process.command_line, decoded_base64_cmdline

//4. Detect masquerading attempts as native Windows binaries
//MITRE Tactics: "Defense Evasion"
from logs-*
| where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe"
| keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id
| eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null)
| stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id
| where count_system_bin >= 1 and count_non_system_bin >= 1

//5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries
// Helpful when asking how to use ENRICH query results with enrich policies
from logs-*
| where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and
 not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll"""
| keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id
| ENRICH libs-policy-defend
| where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages")
| eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""),
  dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "")
| stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256
| sort host_count asc

//6. Potential Exfiltration by process total egress bytes
// Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping
//MITRE Tactics: "Command and Control", "Exfiltration"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  event.action == "disconnect_received" and
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.executable, process.entity_id
| stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable
 /* more than 1GB out by same process.pid in 8 hours */
| where total_bytes_out >= 1073741824

//7. Windows logon activity by source IP
// Helpful when answering questions about the CASE command (as well as conditional outputs/if statements)
//MITRE Tactics: "Credential Access"
from logs-*
| where host.os.family == "windows" and
  event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and
  source.ip is not null and
  /* noisy failure status codes often associated to authentication misconfiguration */
  not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192"))
| eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null)
| stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip
 /* below threshold should be adjusted to your env logon patterns */
| where count_failed >= 100 and count_success <= 10 and count_user >= 20

//8. High count of network connection over extended period by process
//Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations
//MITRE Tactics:  "Command and Control"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*")  and not user.id in ("S-1-5-19", "S-1-5-20") and
/* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */
not (process.name == "svchost.exe" and user.id == "S-1-5-18") and
/* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 1 and number_of_con_per_hour >= 120

//9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence
//Helpful when answering questions on concatenating fields, dealing with time based searches
//MITRE Tactics: "Persistence"
from logs-*
| where  @timestamp > now() - 7 day
| where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and
  (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null
| eval args = MV_CONCAT(Persistence.args, ",")
 /* normalizing users home profile */
| eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/")
| stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args
| where starts_with(args, "/") and agents == 1 and total == 1

//10. Suspicious Network Connections by unsigned macO
//Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions
//MITRE Tactics: "Command and Control"
from logs-*
| where host.os.family == "macos" and event.category == "network" and
  (process.code_signature.exists == false or process.code_signature.trusted != true) and
  /* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 8 and number_of_con_per_hour >= 120

//11. Unusual file creations by web server user
//Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations
FROM logs-*
| WHERE @timestamp > NOW() - 50 day
| WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and (
  file.path like "/var/www/*" or
  file.path like "/var/tmp/*" or
  file.path like "/tmp/*" or
  file.path like "/dev/shm/*"
)
| STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name
// Alter this threshold to make sense for your environment
| WHERE file_count <= 5
| SORT file_count asc
| LIMIT 100

//12. Segmentation Fault & Potential Buffer Overflow Hunting
//Helpful when answering questions on extractions with GROK
FROM logs-*
| WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*"
| GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]"
| KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset

//13. Persistence via Systemd (timers)
//Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations
FROM logs-*
| WHERE host.os.type == "linux" and event.type in ("creation", "change") and (

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*"

) and not (
    process.name in (
      "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon",
      "netplan", "systemd", "generate"
    ) or
    process.executable == "/proc/self/exe" or
    process.executable like "/dev/fd/*" or
    file.extension in ("dpkg-remove", "swx", "swp")
)
| EVAL persistence = CASE(

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*",
    process.name,
    null
)
| STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name
| WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3
| SORT cc asc
| LIMIT 100

//14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs
//Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats
from logs-*
| where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData"
and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole"
| dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}"
| dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}"
| dissect user.id "%{principal_id}:%{session_name}"
| keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name
| stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name
| where instance_counts < 5
| sort instance_counts desc
```
</p>
</details>

(cherry picked from commit 6137f81)
spong added a commit to spong/kibana that referenced this pull request Jul 17, 2024
…Generation (elastic#188492)

## Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query
Generation tool as provided by @jamesspi. The number of examples have
stayed the same, as have the file names -- so I've only updated the raw
content here.

> [!NOTE]
> Since we're enabling the new `kbDataClient` with
elastic#188168 for `8.15`, there is no
need for a delete/re-install for pre-existing deployments to use these
new example queries, as the Knowledge Base will be rebuilt on an upgrade
to `8.15`.

Token length changes as calculated using the [GPT-4
Tokenizer](https://platform.openai.com/tokenizer):

<details><summary>Existing Example Queries / Tokens: 1,108 / Characters:
4151</summary>
<p>

```
[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12",
"192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) by user.name, host.name
| ENRICH ldap_lookup_new ON user.name
| WHERE group.name IS NOT NULL
| EVAL follow_up = CASE(
    destcount >= 100, "true",
     "false")
| SORT destcount desc
| KEEP destcount, host.name, user.name, group.name, follow_up
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| grok dns.question.name
"%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}"
| stats unique_queries = count_distinct(dns.question.name) by
dns.question.registered_domain, process.name
| where unique_queries > 5
| sort unique_queries desc
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.code is not null
| stats event_code_count = count(event.code) by event.code,host.name
| enrich win_events on event.code with EVENT_DESCRIPTION
| where EVENT_DESCRIPTION is not null and host.name is not null
| rename EVENT_DESCRIPTION as event.description
| sort event_code_count desc
| keep event_code_count,event.code,host.name,event.description
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.category == "file" and event.action == "creation"
| stats filecount = count(file.name) by process.name,host.name
| dissect process.name "%{process}.%{extension}"
| eval proclength = length(process.name)
| where proclength > 10
| sort filecount,proclength desc
| limit 10
| keep
host.name,process.name,filecount,process,extension,fullproc,proclength
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where process.name == "curl.exe"
| stats bytes = sum(destination.bytes) by destination.address
| eval kb =  bytes/1024
| sort kb desc
| limit 10
| keep kb,destination.address
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metrics-apm*
| WHERE metricset.name == "transaction" AND metricset.interval == "1m"
| EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50,
<start-date>, <end-date>)
| STATS avg_duration = AVG(transaction.duration.histogram) BY bucket
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM packetbeat-*
| STATS doc_count = COUNT(destination.domain) BY destination.domain
| SORT doc_count DESC
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM employees
| EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy")
| SORT hire_date
| KEEP emp_no, hire_date_formatted
| LIMIT 5
```

[[esql-example-queries]]

The following is NOT an example of an ES|QL query:

```
Pagination is not supported
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE @timestamp >= NOW() - 15 minutes
| EVAL bucket = DATE_TRUNC(1 minute, @timestamp)
| STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM traces-apm*
| WHERE @timestamp >= NOW() - 24 hours
| EVAL successful = CASE(event.outcome == "success", 1, 0),
  failed = CASE(event.outcome == "failure", 1, 0)
| STATS success_rate = AVG(successful),
  avg_duration = AVG(transaction.duration),
  total_requests = COUNT(transaction.id) BY service.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metricbeat*
| EVAL cpu_pct_normalized = (system.cpu.user.pct +
system.cpu.system.pct) / system.cpu.cores
| STATS AVG(cpu_pct_normalized) BY host.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM postgres-logs
| DISSECT message "%{} duration: %{query_duration} ms"
| EVAL query_duration_num = TO_DOUBLE(query_duration)
| STATS avg_duration = AVG(query_duration_num)
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM nyc_taxis
| WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND
DATE_EXTRACT(drop_off_time, "hour") < 10
| LIMIT 10
```

```
</p>
</details>

<details><summary>8.15 Example Queries / Tokens: 4,847 /
Characters:16671</summary>
<p>

```
// 1. regex to extract from dns.question.registered_domain
// Helpful when asking how to use GROK to extract values via REGEX
from logs-*
| where dns.question.name like "?*"
| grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)"""
| keep dns_registered_domain
| limit 10

// 2. hunting scheduled task with suspicious actions via registry.data.bytes
// Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity
from logs-*
| where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and
  registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*"""
| eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "")
| eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "")
| where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*"
| keep scheduled_task_action, registry.path, agent.id
| stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1

// 3. suspicious powershell cmds from base64 encoded cmdline
// Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands)
from logs-*
| where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*"
| keep agent.id, process.command_line
| grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))"""
| where base64_data is not null
| eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "")
| where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*"""
| keep agent.id, process.command_line, decoded_base64_cmdline

//4. Detect masquerading attempts as native Windows binaries
//MITRE Tactics: "Defense Evasion"
from logs-*
| where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe"
| keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id
| eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null)
| stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id
| where count_system_bin >= 1 and count_non_system_bin >= 1

//5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries
// Helpful when asking how to use ENRICH query results with enrich policies
from logs-*
| where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and
 not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll"""
| keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id
| ENRICH libs-policy-defend
| where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages")
| eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""),
  dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "")
| stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256
| sort host_count asc

//6. Potential Exfiltration by process total egress bytes
// Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping
//MITRE Tactics: "Command and Control", "Exfiltration"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  event.action == "disconnect_received" and
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.executable, process.entity_id
| stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable
 /* more than 1GB out by same process.pid in 8 hours */
| where total_bytes_out >= 1073741824

//7. Windows logon activity by source IP
// Helpful when answering questions about the CASE command (as well as conditional outputs/if statements)
//MITRE Tactics: "Credential Access"
from logs-*
| where host.os.family == "windows" and
  event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and
  source.ip is not null and
  /* noisy failure status codes often associated to authentication misconfiguration */
  not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192"))
| eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null)
| stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip
 /* below threshold should be adjusted to your env logon patterns */
| where count_failed >= 100 and count_success <= 10 and count_user >= 20

//8. High count of network connection over extended period by process
//Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations
//MITRE Tactics:  "Command and Control"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*")  and not user.id in ("S-1-5-19", "S-1-5-20") and
/* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */
not (process.name == "svchost.exe" and user.id == "S-1-5-18") and
/* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 1 and number_of_con_per_hour >= 120

//9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence
//Helpful when answering questions on concatenating fields, dealing with time based searches
//MITRE Tactics: "Persistence"
from logs-*
| where  @timestamp > now() - 7 day
| where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and
  (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null
| eval args = MV_CONCAT(Persistence.args, ",")
 /* normalizing users home profile */
| eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/")
| stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args
| where starts_with(args, "/") and agents == 1 and total == 1

//10. Suspicious Network Connections by unsigned macO
//Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions
//MITRE Tactics: "Command and Control"
from logs-*
| where host.os.family == "macos" and event.category == "network" and
  (process.code_signature.exists == false or process.code_signature.trusted != true) and
  /* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 8 and number_of_con_per_hour >= 120

//11. Unusual file creations by web server user
//Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations
FROM logs-*
| WHERE @timestamp > NOW() - 50 day
| WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and (
  file.path like "/var/www/*" or
  file.path like "/var/tmp/*" or
  file.path like "/tmp/*" or
  file.path like "/dev/shm/*"
)
| STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name
// Alter this threshold to make sense for your environment
| WHERE file_count <= 5
| SORT file_count asc
| LIMIT 100

//12. Segmentation Fault & Potential Buffer Overflow Hunting
//Helpful when answering questions on extractions with GROK
FROM logs-*
| WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*"
| GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]"
| KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset

//13. Persistence via Systemd (timers)
//Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations
FROM logs-*
| WHERE host.os.type == "linux" and event.type in ("creation", "change") and (

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*"

) and not (
    process.name in (
      "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon",
      "netplan", "systemd", "generate"
    ) or
    process.executable == "/proc/self/exe" or
    process.executable like "/dev/fd/*" or
    file.extension in ("dpkg-remove", "swx", "swp")
)
| EVAL persistence = CASE(

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*",
    process.name,
    null
)
| STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name
| WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3
| SORT cc asc
| LIMIT 100

//14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs
//Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats
from logs-*
| where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData"
and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole"
| dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}"
| dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}"
| dissect user.id "%{principal_id}:%{session_name}"
| keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name
| stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name
| where instance_counts < 5
| sort instance_counts desc
```
</p>
</details>

(cherry picked from commit 6137f81)
mistic pushed a commit to mistic/kibana that referenced this pull request Jul 17, 2024
…Generation (elastic#188492)

## Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query
Generation tool as provided by @jamesspi. The number of examples have
stayed the same, as have the file names -- so I've only updated the raw
content here.

> [!NOTE]
> Since we're enabling the new `kbDataClient` with
elastic#188168 for `8.15`, there is no
need for a delete/re-install for pre-existing deployments to use these
new example queries, as the Knowledge Base will be rebuilt on an upgrade
to `8.15`.

Token length changes as calculated using the [GPT-4
Tokenizer](https://platform.openai.com/tokenizer):

<details><summary>Existing Example Queries / Tokens: 1,108 / Characters:
4151</summary>
<p>

```
[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12",
"192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) by user.name, host.name
| ENRICH ldap_lookup_new ON user.name
| WHERE group.name IS NOT NULL
| EVAL follow_up = CASE(
    destcount >= 100, "true",
     "false")
| SORT destcount desc
| KEEP destcount, host.name, user.name, group.name, follow_up
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| grok dns.question.name
"%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}"
| stats unique_queries = count_distinct(dns.question.name) by
dns.question.registered_domain, process.name
| where unique_queries > 5
| sort unique_queries desc
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.code is not null
| stats event_code_count = count(event.code) by event.code,host.name
| enrich win_events on event.code with EVENT_DESCRIPTION
| where EVENT_DESCRIPTION is not null and host.name is not null
| rename EVENT_DESCRIPTION as event.description
| sort event_code_count desc
| keep event_code_count,event.code,host.name,event.description
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where event.category == "file" and event.action == "creation"
| stats filecount = count(file.name) by process.name,host.name
| dissect process.name "%{process}.%{extension}"
| eval proclength = length(process.name)
| where proclength > 10
| sort filecount,proclength desc
| limit 10
| keep
host.name,process.name,filecount,process,extension,fullproc,proclength
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
from logs-*
| where process.name == "curl.exe"
| stats bytes = sum(destination.bytes) by destination.address
| eval kb =  bytes/1024
| sort kb desc
| limit 10
| keep kb,destination.address
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metrics-apm*
| WHERE metricset.name == "transaction" AND metricset.interval == "1m"
| EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50,
<start-date>, <end-date>)
| STATS avg_duration = AVG(transaction.duration.histogram) BY bucket
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM packetbeat-*
| STATS doc_count = COUNT(destination.domain) BY destination.domain
| SORT doc_count DESC
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM employees
| EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy")
| SORT hire_date
| KEEP emp_no, hire_date_formatted
| LIMIT 5
```

[[esql-example-queries]]

The following is NOT an example of an ES|QL query:

```
Pagination is not supported
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM logs-*
| WHERE @timestamp >= NOW() - 15 minutes
| EVAL bucket = DATE_TRUNC(1 minute, @timestamp)
| STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name
| LIMIT 10
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM traces-apm*
| WHERE @timestamp >= NOW() - 24 hours
| EVAL successful = CASE(event.outcome == "success", 1, 0),
  failed = CASE(event.outcome == "failure", 1, 0)
| STATS success_rate = AVG(successful),
  avg_duration = AVG(transaction.duration),
  total_requests = COUNT(transaction.id) BY service.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM metricbeat*
| EVAL cpu_pct_normalized = (system.cpu.user.pct +
system.cpu.system.pct) / system.cpu.cores
| STATS AVG(cpu_pct_normalized) BY host.name
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM postgres-logs
| DISSECT message "%{} duration: %{query_duration} ms"
| EVAL query_duration_num = TO_DOUBLE(query_duration)
| STATS avg_duration = AVG(query_duration_num)
```

[[esql-example-queries]]

The following is an example ES|QL query:

```
FROM nyc_taxis
| WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND
DATE_EXTRACT(drop_off_time, "hour") < 10
| LIMIT 10
```

```
</p>
</details>

<details><summary>8.15 Example Queries / Tokens: 4,847 /
Characters:16671</summary>
<p>

```
// 1. regex to extract from dns.question.registered_domain
// Helpful when asking how to use GROK to extract values via REGEX
from logs-*
| where dns.question.name like "?*"
| grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)"""
| keep dns_registered_domain
| limit 10

// 2. hunting scheduled task with suspicious actions via registry.data.bytes
// Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity
from logs-*
| where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and
  registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*"""
| eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "")
| eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "")
| where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*"
| keep scheduled_task_action, registry.path, agent.id
| stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1

// 3. suspicious powershell cmds from base64 encoded cmdline
// Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands)
from logs-*
| where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*"
| keep agent.id, process.command_line
| grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))"""
| where base64_data is not null
| eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "")
| where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*"""
| keep agent.id, process.command_line, decoded_base64_cmdline

//4. Detect masquerading attempts as native Windows binaries
//MITRE Tactics: "Defense Evasion"
from logs-*
| where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe"
| keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id
| eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null)
| stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id
| where count_system_bin >= 1 and count_non_system_bin >= 1

//5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries
// Helpful when asking how to use ENRICH query results with enrich policies
from logs-*
| where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and
 not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll"""
| keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id
| ENRICH libs-policy-defend
| where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages")
| eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""),
  dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "")
| stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256
| sort host_count asc

//6. Potential Exfiltration by process total egress bytes
// Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping
//MITRE Tactics: "Command and Control", "Exfiltration"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  event.action == "disconnect_received" and
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.executable, process.entity_id
| stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable
 /* more than 1GB out by same process.pid in 8 hours */
| where total_bytes_out >= 1073741824

//7. Windows logon activity by source IP
// Helpful when answering questions about the CASE command (as well as conditional outputs/if statements)
//MITRE Tactics: "Credential Access"
from logs-*
| where host.os.family == "windows" and
  event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and
  source.ip is not null and
  /* noisy failure status codes often associated to authentication misconfiguration */
  not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192"))
| eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null)
| stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip
 /* below threshold should be adjusted to your env logon patterns */
| where count_failed >= 100 and count_success <= 10 and count_user >= 20

//8. High count of network connection over extended period by process
//Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations
//MITRE Tactics:  "Command and Control"
from logs-*
| where host.os.family == "windows" and event.category == "network" and
  network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*")  and not user.id in ("S-1-5-19", "S-1-5-20") and
/* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */
not (process.name == "svchost.exe" and user.id == "S-1-5-18") and
/* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 1 and number_of_con_per_hour >= 120

//9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence
//Helpful when answering questions on concatenating fields, dealing with time based searches
//MITRE Tactics: "Persistence"
from logs-*
| where  @timestamp > now() - 7 day
| where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and
  (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null
| eval args = MV_CONCAT(Persistence.args, ",")
 /* normalizing users home profile */
| eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/")
| stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args
| where starts_with(args, "/") and agents == 1 and total == 1

//10. Suspicious Network Connections by unsigned macO
//Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions
//MITRE Tactics: "Command and Control"
from logs-*
| where host.os.family == "macos" and event.category == "network" and
  (process.code_signature.exists == false or process.code_signature.trusted != true) and
  /* excluding private IP ranges */
  not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8")
| keep source.bytes, destination.address, process.name, process.entity_id, @timestamp
 /* calc total duration , total MB out and the number of connections per hour */
| stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name
| eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours)
| keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour
/* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */
| where duration_hours >= 8 and number_of_con_per_hour >= 120

//11. Unusual file creations by web server user
//Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations
FROM logs-*
| WHERE @timestamp > NOW() - 50 day
| WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and (
  file.path like "/var/www/*" or
  file.path like "/var/tmp/*" or
  file.path like "/tmp/*" or
  file.path like "/dev/shm/*"
)
| STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name
// Alter this threshold to make sense for your environment
| WHERE file_count <= 5
| SORT file_count asc
| LIMIT 100

//12. Segmentation Fault & Potential Buffer Overflow Hunting
//Helpful when answering questions on extractions with GROK
FROM logs-*
| WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*"
| GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]"
| KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset

//13. Persistence via Systemd (timers)
//Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations
FROM logs-*
| WHERE host.os.type == "linux" and event.type in ("creation", "change") and (

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*"

) and not (
    process.name in (
      "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon",
      "netplan", "systemd", "generate"
    ) or
    process.executable == "/proc/self/exe" or
    process.executable like "/dev/fd/*" or
    file.extension in ("dpkg-remove", "swx", "swp")
)
| EVAL persistence = CASE(

    // System-wide/user-specific services/timers (root permissions required)
    file.path like "/run/systemd/system/*" or
    file.path like "/etc/systemd/system/*" or
    file.path like "/etc/systemd/user/*" or
    file.path like "/usr/local/lib/systemd/system/*" or
    file.path like "/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/system/*" or
    file.path like "/usr/lib/systemd/user/*" or

    // user-specific services/timers (user permissions required)
    file.path like "/home/*/.config/systemd/user/*" or
    file.path like "/home/*/.local/share/systemd/user/*" or

    // System-wide generators (root permissions required)
    file.path like "/etc/systemd/system-generators/*" or
    file.path like "/usr/local/lib/systemd/system-generators/*" or
    file.path like "/lib/systemd/system-generators/*" or
    file.path like "/etc/systemd/user-generators/*" or
    file.path like "/usr/local/lib/systemd/user-generators/*" or
    file.path like "/usr/lib/systemd/user-generators/*",
    process.name,
    null
)
| STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name
| WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3
| SORT cc asc
| LIMIT 100

//14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs
//Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats
from logs-*
| where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData"
and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole"
| dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}"
| dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}"
| dissect user.id "%{principal_id}:%{session_name}"
| keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name
| stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name
| where instance_counts < 5
| sort instance_counts desc
```
</p>
</details>

(cherry picked from commit 6137f81)
@elasticmachine
Copy link
Contributor

⏳ Build in-progress, with failures

Failed CI Steps

History

cc @spong

@spong spong removed the ci:cloud-redeploy Always create a new Cloud deployment label Jul 17, 2024
@spong spong enabled auto-merge (squash) July 17, 2024 21:33
Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you Garrett for leading this milestone of the KB automation to the 🚀

@spong spong merged commit 661c251 into elastic:main Jul 17, 2024
40 of 41 checks passed
@spong spong deleted the bye-bye-feature-flag branch July 17, 2024 21:45
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.15 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 188168

Questions ?

Please refer to the Backport tool documentation

spong added a commit to spong/kibana that referenced this pull request Jul 17, 2024
…ngGraph code paths for `8.15` (elastic#188168)

## Summary

This PR enables the automatic setup of the Knowledge Base and LangGraph
code paths for the `8.15` release. These features were behind the
`assistantKnowledgeBaseByDefault` feature flag, which will remain as a
gate for upcoming Knowledge Base features that were not ready for this
release.

As part of these changes, we now only support the new LangGraph code
path, and so were able to clean up the non-kb and non-RAGonAlerts code
paths. All paths within the `post_actions_executor` route funnel to the
LangGraph implementation.

> [!NOTE]
> We were planning to do the switch to the new
[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)
public API, however this would've required additional refactoring since
the API's slightly differ. We will make this change and delete the
`post_actions_executor` route for the next release.

### Checklist

Delete any items that are not applicable to this PR.

- [X] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- Working with docs team to ensure updates before merging, cc
@benironside
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Steph Milovic <stephanie.milovic@elastic.co>
(cherry picked from commit 661c251)

# Conflicts:
#	.buildkite/ftr_configs.yml
@spong
Copy link
Member Author

spong commented Jul 17, 2024

💚 All backports created successfully

Status Branch Result
8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

spong added a commit that referenced this pull request Jul 17, 2024
… and LangGraph code paths for `8.15` (#188168) (#188605)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Security Assistant] Enables automatic setup of Knowledge Base and
LangGraph code paths for `8.15`
(#188168)](#188168)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Garrett
Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-07-17T21:44:24Z","message":"[Security
Assistant] Enables automatic setup of Knowledge Base and LangGraph code
paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the
automatic setup of the Knowledge Base and LangGraph\r\ncode paths for
the `8.15` release. These features were behind
the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain
as a\r\ngate for upcoming Knowledge Base features that were not ready
for this\r\nrelease.\r\n\r\nAs part of these changes, we now only
support the new LangGraph code\r\npath, and so were able to clean up the
non-kb and non-RAGonAlerts code\r\npaths. All paths within the
`post_actions_executor` route funnel to the\r\nLangGraph
implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch
to the
new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic
API, however this would've required additional refactoring since\r\nthe
API's slightly differ. We will make this change and delete
the\r\n`post_actions_executor` route for the next
release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [X] Any text added
follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- Working
with docs team to ensure updates before merging, cc\r\n@benironside\r\n-
[X] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic
<stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Security
Assistant","Team:Security Generative
AI","v8.15.0","v8.16.0"],"number":188168,"url":"https://github.com/elastic/kibana/pull/188168","mergeCommit":{"message":"[Security
Assistant] Enables automatic setup of Knowledge Base and LangGraph code
paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the
automatic setup of the Knowledge Base and LangGraph\r\ncode paths for
the `8.15` release. These features were behind
the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain
as a\r\ngate for upcoming Knowledge Base features that were not ready
for this\r\nrelease.\r\n\r\nAs part of these changes, we now only
support the new LangGraph code\r\npath, and so were able to clean up the
non-kb and non-RAGonAlerts code\r\npaths. All paths within the
`post_actions_executor` route funnel to the\r\nLangGraph
implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch
to the
new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic
API, however this would've required additional refactoring since\r\nthe
API's slightly differ. We will make this change and delete
the\r\n`post_actions_executor` route for the next
release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [X] Any text added
follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- Working
with docs team to ensure updates before merging, cc\r\n@benironside\r\n-
[X] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic
<stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e"}},"sourceBranch":"main","suggestedTargetBranches":["8.15"],"targetPullRequestStates":[{"branch":"8.15","label":"v8.15.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188168","number":188168,"mergeCommit":{"message":"[Security
Assistant] Enables automatic setup of Knowledge Base and LangGraph code
paths for `8.15` (#188168)\n\n## Summary\r\n\r\nThis PR enables the
automatic setup of the Knowledge Base and LangGraph\r\ncode paths for
the `8.15` release. These features were behind
the\r\n`assistantKnowledgeBaseByDefault` feature flag, which will remain
as a\r\ngate for upcoming Knowledge Base features that were not ready
for this\r\nrelease.\r\n\r\nAs part of these changes, we now only
support the new LangGraph code\r\npath, and so were able to clean up the
non-kb and non-RAGonAlerts code\r\npaths. All paths within the
`post_actions_executor` route funnel to the\r\nLangGraph
implementation.\r\n\r\n> [!NOTE]\r\n> We were planning to do the switch
to the
new\r\n[`chat/completions`](https://github.com/elastic/kibana/pull/184485/files)\r\npublic
API, however this would've required additional refactoring since\r\nthe
API's slightly differ. We will make this change and delete
the\r\n`post_actions_executor` route for the next
release.\r\n\r\n\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [X] Any text added
follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- Working
with docs team to ensure updates before merging, cc\r\n@benironside\r\n-
[X] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Steph Milovic
<stephanie.milovic@elastic.co>","sha":"661c25133d8ddece47601231cc7b4f91e4f4bd6e"}}]}]
BACKPORT-->
spong added a commit that referenced this pull request Jul 18, 2024
## Summary

In #188168 we cleaned up some of
our API tests, but missed these other references and so have failures on
the [periodic test
pipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).
This PR updates these configs to remove the test commands that no longer
exist.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jul 18, 2024
## Summary

In elastic#188168 we cleaned up some of
our API tests, but missed these other references and so have failures on
the [periodic test
pipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).
This PR updates these configs to remove the test commands that no longer
exist.

(cherry picked from commit 40b966c)
kibanamachine added a commit that referenced this pull request Jul 18, 2024
# Backport

This will backport the following commits from `main` to `8.15`:
- [[Security Assistant] Cleanup MKI test configs
(#188665)](#188665)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Garrett
Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-07-18T16:09:22Z","message":"[Security
Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn
#188168 we cleaned up some
of\r\nour API tests, but missed these other references and so have
failures on\r\nthe [periodic
test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis
PR updates these configs to remove the test commands that no
longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Security
Generative AI","v8.15.0","v8.16.0"],"title":"[Security Assistant]
Cleanup MKI test
configs","number":188665,"url":"https://github.com/elastic/kibana/pull/188665","mergeCommit":{"message":"[Security
Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn
#188168 we cleaned up some
of\r\nour API tests, but missed these other references and so have
failures on\r\nthe [periodic
test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis
PR updates these configs to remove the test commands that no
longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c"}},"sourceBranch":"main","suggestedTargetBranches":["8.15"],"targetPullRequestStates":[{"branch":"8.15","label":"v8.15.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188665","number":188665,"mergeCommit":{"message":"[Security
Assistant] Cleanup MKI test configs (#188665)\n\n## Summary\r\n\r\nIn
#188168 we cleaned up some
of\r\nour API tests, but missed these other references and so have
failures on\r\nthe [periodic
test\r\npipeline](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-gen-ai/builds/854).\r\nThis
PR updates these configs to remove the test commands that no
longer\r\nexist.","sha":"40b966c0c8242b0ecade9bbb4e1372fcf1beb37c"}}]}]
BACKPORT-->

Co-authored-by: Garrett Spong <spong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Security Assistant Security Assistant release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI v8.15.0 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants