Skip to content

✨ A compilation of suggested tools/services for each component in a detection and response pipeline, along with real-world examples. The purpose is to create a reference hub for designing effective threat detection and response pipelines. 👷 🏗

Notifications You must be signed in to change notification settings

0x4D31/detection-and-response-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 

Repository files navigation

Detection and Response Pipeline

✨ A compilation of suggested tools for each component in a detection and response pipeline, along with real-world examples. The purpose is to create a reference hub for designing effective threat detection and response pipelines. 👷 🏗

Join us, explore the curated content, and contribute to this collaborative effort.

Contents

Main Components of a Detection & Response Pipeline:

  1. 📦 Detection-as-Code Pipeline
  2. 🪵 Data Pipeline
  3. ⚠️ Detection and Correlation Engine
  4. ⚙️ Response Orchestration and Automation
  5. 🔍 Investigation and Case Management

💡 Real-world Examples

📑 Additional Resources

Detection-as-Code Pipeline

Tool / Service Purpose
GitHub Detection content development
GitLab Detection content development
Gitea Detection content development
AWS CodeCommit Detection content development
GitHub Actions CI/CD pipeline
GitLab Runner CI/CD pipeline
Drone CI/CD pipeline
AWS CodePipeline CI/CD pipeline

Resources

  • Automating Detection-as-Code: An example reference that uses GitHub for detection content development, GitHub Actions for CI/CD, Elastic as SIEM, GitHub Issues for alert management, and Tines for alert and response handling.
  • Practical Detection-as-Code: An example Detection-as-Code pipeline implementation using Sigma rules, GitLab CI/CD, and Splunk.
  • CI/CD Detection Engineering (part 1, part 2, part 3, part 4): An example CI/CD detection engineering workflow in a Splunk environment.
  • From soup to nuts: Building a Detection-as-Code pipeline (Part 1, Part 2)
  • Getting Started with Detection-as-Code and Chronicle Security Operations (Part 1, Part 2)

Data Pipeline

Tool / Service Purpose Deployment
Substation Data movement and transformation Self-hosted (Open Source)
Vector Data movement and transformation Self-hosted (Open Source)
Tenzir Data movement and transformation Self-hosted (Open Source)
Fluent Bit Data movement and transformation Self-hosted (Open Source)
Logstash Data movement and transformation Self-hosted (Open Source)
Airbyte Data movement and transformation Self-hosted (Open Source) and Cloud
Cribl Stream Data movement and transformation Self-hosted (Free), Hybrid and Cloud
Tarsal Data movement and transformation Cloud
Kafka Stream processing Self-hosted (Open source) and Cloud (Confluent)
Amazon Kinesis Data Streams Stream processing Cloud
Apache Spark Stream and batch processing Self-hosted (Open source)
Databricks Stream and batch processing Cloud
Google Cloud DataFlow Stream and batch processing Cloud
Apache Flink Stream and batch processing Self-hosted (Open source)
Apache NiFi Stream and batch processing Self-hosted (Open source)
Apache Beam Stream and batch processing Open source; Self-hosted or cloud-based runner
Faust Stream and batch processing Self-hosted (Open source)

Detection and Correlation Engine

In addition to the stream and batch processing tools mentioned in the data pipeline section, the following tools can be used for data analysis and detection.

Tool / Service Description
Elasticsearch with ElastAlert2 or Elastic $ecurity
OpenSearch with ElastAlert2 or OpenSearch Security Analytics
Amazon Kinesis Data Analytics Streaming data analysis in real time using Apache Flink
Matano Open source security lake platform for AWS
ksqlDB SQL-Based Streaming for Kafka
StreamAlert Real-time data analysis and alerting framework

Response Orchestration and Automation

Tool / Service Description Deployment
n8n A free and source-available workflow automation tool Self-hosted (Source available) and Cloud
Shuffle A general purpose security automation platform Self-hosted (Open source) and Cloud
Tines No-code automation for security workflows Self-hosted ($) and Cloud
Torq No-code hyperautomation for security workflows Cloud

Investigation and Case Management

Tool / Service Description Deployment
DFIR IRIS Open-Source Collaborative Incident Response Platform Self-hosted (Open source)
TheHive Open Source and Free Security Incident Response Platform Self-hosted (Open source)
GitHub GitHub issues can be used for case management. Check out the video in the Resources section. Cloud
Jira Service Management IT service management platform with incident management features Cloud
Tines Cases Cloud
Torq Case Management Cloud

Resources:

Real-world Examples

Please note that this information is extracted from public blog posts and conference talks, and may not be comprehensive or reflect the current state of the companies' pipelines. Some examples may focus on specific components, such as the correlation engine, rather than covering the entire pipeline. These examples are intended as starting points, so please view them as informative rather than definitive solutions.

If you have additional information or insights about any of the examples included here and have permission to share them, we encourage you to contribute by sending a pull request to enhance or add more details.

# Technologies / Components Note References
0 • Databricks
• Apache Spark
• Delta Lake
• Scala
"Apple must detect a wide variety of security threats, and rises to the challenge using Apache Spark across a diverse pool of telemetry. Some of the home-grown solutions we’ve built to address complications of scale:
1. Notebook-based testing CI – Previously we had a hybrid development model for Structured Streaming jobs wherein most code would be written and tested inside of notebooks, but unit tests required export of the notebook into a user’s IDE along with JSON sample files to be executed by a local SparkSession. We’ve deployed a novel CI solution leveraging the Databricks Jobs API that executes the notebooks on a real cluster using sample files in DBFS. When coupled with our new test-generation library, we’ve seen 2/3 reduction in the amount of time required for testing and 85% less LoC.
2. Self-Tuning Alerts – Apple has a team of security analysts triaging the alerts generated by our detection rules. They annotate them as either ‘False Positive’ or ‘True Positive’ following the results of their analysis. We’ve incorporated this feedback into our Structured Streaming pipeline, so the system automatically learns from consensus and adjusts future behavior. This helps us amplify the signal from the rest of the noise.
3. Automated Investigations – There are some standard questions an analyst might ask when triaging an alert, like: what does this system usually do, where is it, and who uses it? Using ODBC and the Workspace API, we’ve been able to templatize many investigations and in some cases automate the entire process up to and including incident containment.
4. DetectionKit – We’ve written a custom SDK to formalize the configuration and testing of jobs, including some interesting features such as modular pre/post processor transform functions, and a stream-compatible exclusion mechanism using foreach Batch."
1. Scaling Security Threat Detection with Apache Spark and Databricks by Josh Gillner (Apple Detection Engineering)
2. Threat Detection and Response at Scale by Dominque Brezinski (Apple)
1 • Kafka
• Apache Spark
• Apache Hive
• Elasticsearch
• GraphQL
• Amazon S3
• Slack
• PagerDuty
A SOCless Detection Team at Netflix by Alex Maestretti (Netflix)
2 • Kafka
Apache Samza
• Microsoft Sentinel?
KQL
• Azure Pipelines and Repos for CI/CD pipeline
• Jira
• ServiceNow
• Serverless functions
high-level strategy

Simplified data collection pipeline
(Re)building Threat Detection and Incident Response at LinkedIn by Sagar Shah and Jeff Bollinger (Linkedin)
3 go-audit
• Elasticsearch
ElastAlert[0]
"We send the events to an Elasticsearch cluster. From there we use ElastAlert to query our incoming data continuously for alert generation and general monitoring." Syscall Auditing at Scale by Ryan Huber (Slack)
4 • Kafka
• Jupyter notebook
• Python
• osquery, Santa, and OpenBSM/Audit for MacOS monitoring
"Alertbox was the first project we built to start cutting down on our triage time. The goal was to move our alert response runbooks into code, and have them execute before we even begin the triage process.
Think of Forerunner as the glue between Alertbox and Covenant. When an alert fires, Alertbox calls out a RPC service called Forerunner. This service returns a Jupyter notebook corresponding to the alert. Alertbox then embeds the URL of this Jupyter notebook into the alert ticket. In the background, Forerunner also runs this alert notebook asynchronously."

1. How Dropbox Security builds tools for threat detection and incident response by Dropbox DART
2. MacOS monitoring the open source way by Michael George (Dropbox)
3. [OLD] Meet Securitybot: Open Sourcing Automated Security at Scale by Alex Bertsch (Dropbox) and Distributed Security Alerting by Ryan Huber (Slack)
5 StreamAlert
BinaryAlert
- "StreamAlert is a serverless, real-time data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using data sources and alerting logic you define. Computer security teams use StreamAlert to scan terabytes of log data every day for incident detection and response."
- "BinaryAlert is an open-source serverless AWS pipeline where any file uploaded to an S3 bucket is immediately scanned with a configurable set of YARA rules. An alert will fire as soon as any match is found, giving an incident response team the ability to quickly contain the threat before it spreads."
1. StreamAlert: Real-time Data Analysis and Alerting by Airbnb Eng
2. BinaryAlert: Real-time Serverless Malware Detection by Austin Byers (Airbnb)
6 • ELK stack
• Kafka
KSQL
ES-Hadoop
• ElastAlert[0]
• Apache Spark
• Jupyter notebook
• GraphFrames
"The Hunting ELK or simply the HELK is one of the first open source hunt platforms with advanced analytics capabilities such as SQL declarative language, graphing, structured streaming, and even machine learning via Jupyter notebooks and Apache Spark over an ELK stack. This project was developed primarily for research, but due to its flexible design and core components, it can be deployed in larger environments with the right configurations and scalable infrastructure."
The Hunting ELK project by Roberto Rodriguez
7 • AWS Kinesis Firehose
• AWS Kinesis Data Analytics Application
• AWS Lambda
• AWS S3
• AWS Athena
• AWS Simple Notification Services
"In this example, various AWS serverless application services are used together to create a detection pipeline that is capable of near-realtime detection. The pipeline requires no administrative overhead of servers or container infrastructure, enabling a detection and response team to focus on threat detection capabilities."
Building a Serverless Detection Platform in AWS Pt. I: Endpoint Detection by Brendan Chamberlain
  1. ElastAlert is no longer maintained. You can use ElastAlert2 instead.

Additional Resources

License

CC0

To the extent possible under law, Adel "0x4D31" Karimi has waived all copyright and related or neighboring rights to this work.

About

✨ A compilation of suggested tools/services for each component in a detection and response pipeline, along with real-world examples. The purpose is to create a reference hub for designing effective threat detection and response pipelines. 👷 🏗

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •