[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944

nastasha-solomon · 2025-11-14T19:35:42Z

⚠️ This PR was created by AI and has not been edited yet. I'm only using it for testing purposes and will not be merging it. ⚠️

This commit implements critical improvements to the 'Create a detection rule' documentation based on a comprehensive usability review from the DevOps engineer persona perspective.

Key improvements:

Resource Planning and Performance (Critical)

Added 'Resource planning and performance considerations' section with:
- Detailed resource requirements by rule type (execution time, memory)
- Capacity planning guidance for running multiple rules
- Circuit breaker prevention and troubleshooting
- Guidance on staggering rule activation to prevent thundering herd

Rule Type Decision Support (Critical)

Added 'Understanding rule types' comparison table
Clear guidance on when to use each rule type
Query language quick reference (KQL vs Lucene vs EQL vs ES|QL)
Recommendation: 90% of use cases use Custom Query + KQL

Infrastructure-Focused Examples (Critical)

Added practical examples for DevOps use cases:
- Detect failed SSH login attempts
- Detect unusual outbound network connections
Each example includes prerequisites, testing steps, expected behavior, and tuning tips

Enhanced Scheduling Guidance (Critical)

Reframed 'Additional look-back time' as CRITICAL not optional
Explained three failure scenarios (execution delay, ingestion delay, Kibana restarts)
Added scheduling strategy for multiple rules with load distribution
Performance-based interval recommendations by rule type

Improved Threshold Rule Documentation (Critical)

Added specific cardinality definitions (low/medium/high risk levels)
Provided diagnostic query to check cardinality before creating rule
Explained circuit breaker error messages with exact text users will see
Step-by-step resolution procedures

Enhanced ML Rule Warnings (Critical)

Added comprehensive warning about ML job startup (30-60s delay)
Resource requirements (2GB RAM per job)
Baseline period explanation (7-14 days for learning)
Production deployment best practices

Max Alerts Per Run Clarification (Critical)

Explained that rule STOPS processing when limit reached (not just warning)
Added detection methods for when limit is hit
Performance impact data (100ms per 100 alerts)
Decision framework for appropriate values

Improved Rule Actions/Notifications (Major)

Clear licensing requirements (Gold+ for Stack, included in Serverless)
Common notification patterns (severity-based routing, on-call integration)
Action reliability and failure handling (3 retries, then dropped)
How to diagnose failed notifications

Strengthened Response Actions Warning (Critical)

Elevated to comprehensive warning with real-world failure scenarios
Three-phase safe deployment process (notifications → manual → limited automation)
'Never automate response for' list (prod databases, k8s masters, CI/CD)
Required safeguards and emergency rollback procedure

Integrated Troubleshooting Section (Critical)

Added 'Common issues after creating rules' section at end of document
Six common problems with diagnosis and solutions:
- Rule shows Warning status
- Rule creates zero alerts
- Too many alerts
- Gaps in rule execution
- Actions not sending
- Performance degradation over time
Links to additional troubleshooting resources

These changes address the top 7 critical issues and 5 major issues identified in the usability review, significantly improving the documentation for operations teams managing security detection rules at scale.

This commit implements critical improvements to the 'Create a detection rule' documentation based on a comprehensive usability review from the DevOps engineer persona perspective. Key improvements: ## Resource Planning and Performance (Critical) - Added 'Resource planning and performance considerations' section with: - Detailed resource requirements by rule type (execution time, memory) - Capacity planning guidance for running multiple rules - Circuit breaker prevention and troubleshooting - Guidance on staggering rule activation to prevent thundering herd ## Rule Type Decision Support (Critical) - Added 'Understanding rule types' comparison table - Clear guidance on when to use each rule type - Query language quick reference (KQL vs Lucene vs EQL vs ES|QL) - Recommendation: 90% of use cases use Custom Query + KQL ## Infrastructure-Focused Examples (Critical) - Added practical examples for DevOps use cases: - Detect failed SSH login attempts - Detect unusual outbound network connections - Each example includes prerequisites, testing steps, expected behavior, and tuning tips ## Enhanced Scheduling Guidance (Critical) - Reframed 'Additional look-back time' as CRITICAL not optional - Explained three failure scenarios (execution delay, ingestion delay, Kibana restarts) - Added scheduling strategy for multiple rules with load distribution - Performance-based interval recommendations by rule type ## Improved Threshold Rule Documentation (Critical) - Added specific cardinality definitions (low/medium/high risk levels) - Provided diagnostic query to check cardinality before creating rule - Explained circuit breaker error messages with exact text users will see - Step-by-step resolution procedures ## Enhanced ML Rule Warnings (Critical) - Added comprehensive warning about ML job startup (30-60s delay) - Resource requirements (2GB RAM per job) - Baseline period explanation (7-14 days for learning) - Production deployment best practices ## Max Alerts Per Run Clarification (Critical) - Explained that rule STOPS processing when limit reached (not just warning) - Added detection methods for when limit is hit - Performance impact data (100ms per 100 alerts) - Decision framework for appropriate values ## Improved Rule Actions/Notifications (Major) - Clear licensing requirements (Gold+ for Stack, included in Serverless) - Common notification patterns (severity-based routing, on-call integration) - Action reliability and failure handling (3 retries, then dropped) - How to diagnose failed notifications ## Strengthened Response Actions Warning (Critical) - Elevated to comprehensive warning with real-world failure scenarios - Three-phase safe deployment process (notifications → manual → limited automation) - 'Never automate response for' list (prod databases, k8s masters, CI/CD) - Required safeguards and emergency rollback procedure ## Integrated Troubleshooting Section (Critical) - Added 'Common issues after creating rules' section at end of document - Six common problems with diagnosis and solutions: - Rule shows Warning status - Rule creates zero alerts - Too many alerts - Gaps in rule execution - Actions not sending - Performance degradation over time - Links to additional troubleshooting resources These changes address the top 7 critical issues and 5 major issues identified in the usability review, significantly improving the documentation for operations teams managing security detection rules at scale.

solutions/security/detect-and-alert/create-detection-rule.md

github-actions · 2025-11-14T19:53:36Z

🔍 Preview links for changed docs

More links …

This commit restructures the rule type documentation for better maintainability and navigation: ## Changes **New files created in rule-types/ directory:** - custom-query.md: Detailed instructions for creating custom query rules - Includes two practical infrastructure examples (SSH login failures, unusual outbound connections) - Performance guidance and use case recommendations - machine-learning.md: Complete ML rule creation guide - ML job startup considerations and resource requirements - Baseline period guidance and production best practices - threshold.md: Comprehensive threshold rule documentation - Detailed cardinality analysis and risk levels - Circuit breaker error troubleshooting with specific error messages - Diagnostic query examples **Updated main create-detection-rule.md:** - Added 'Create rules by type' navigation section with links to all rule types - Added 'Quick reference: When to use each rule type' with practical examples - Kept summaries of custom query, ML, and threshold rules with links to detailed pages - Full content remains for event correlation, indicator match, new terms, and ES|QL rules (these will be extracted in future commits) **Updated solutions/toc.yml:** - Added rule-types/custom-query.md - Added rule-types/machine-learning.md - Added rule-types/threshold.md - Positioned as children of create-detection-rule.md ## Benefits - **Better organization**: Each rule type now has its own focused documentation - **Easier maintenance**: Changes to one rule type don't affect others - **Improved navigation**: Users can directly access the rule type they need - **Preserved content**: All existing content maintained, just reorganized - **Scalable structure**: Easy to add more rule-specific guidance in the future

…, new-terms, and esql This commit completes the refactoring by extracting the remaining four rule types into separate dedicated files for better organization and maintainability. ## New files created in rule-types/ directory: **event-correlation.md:** - Complete guide for EQL-based event correlation rules - Instructions for detecting sequences of related events - EQL settings configuration (event category field, tiebreaker, timestamp) - Use case: Detect attack patterns involving multiple steps - Performance: Medium (~200-400ms per execution) **indicator-match.md:** - Comprehensive indicator match rule documentation - Threat intelligence feed integration guidance - Detailed threat mapping configuration (MATCHES/DOES NOT MATCH) - Section on using value lists with indicator match rules - Performance: High (~500ms-2s), limit to 15-minute intervals - Recommendation: Keep indicator count under 10,000 **new-terms.md:** - New terms rule creation guide - First-time occurrence detection documentation - History window size configuration - Multi-field combination support (up to 3 fields) - Important note about field array cardinality (100 combination limit) - Performance: Medium (~300ms per execution) **esql.md:** - Complete ES|QL rule documentation - Detailed coverage of aggregating vs. non-aggregating queries - Alert deduplication configuration (METADATA fields) - Query design considerations (LIMIT, STATS...BY, sorting) - Rule limitations and workarounds - Custom highlighted fields guidance - Performance: Variable based on query complexity ## Updated create-detection-rule.md: - Updated 'Create rules by type' navigation to link to all seven rule types - Added summary notes for each rule type pointing to dedicated pages - Updated internal links for ES|QL query types, design considerations, and limitations - Fixed alert deduplication link to point to new esql.md file - Maintained backward compatibility with existing content ## Updated solutions/toc.yml: - Added all four new rule type files as children of create-detection-rule.md - Complete list now includes: - custom-query.md - machine-learning.md - threshold.md - event-correlation.md - indicator-match.md - new-terms.md - esql.md ## Benefits: ✅ **Complete separation**: All rule types now have dedicated, focused documentation ✅ **Better discoverability**: Users can navigate directly to the rule type they need ✅ **Easier maintenance**: Rule type updates are isolated and don't affect other types ✅ **Scalable structure**: Easy to add rule-specific examples, troubleshooting, and best practices ✅ **Preserved content**: All existing content maintained, just better organized ✅ **Improved navigation**: Clear hierarchy in TOC for all detection rule types

This commit removes detailed rule creation instructions from create-detection-rule.md, now that all rule types have been extracted into separate dedicated files. ## What was removed: **Custom query rule:** - Detailed step-by-step instructions - Two infrastructure examples (SSH login failures, unusual outbound connections) - Testing and tuning guidance - Now: Simple pointer to dedicated page with summary bullet points **Machine learning rule:** - Full configuration steps - ML job startup considerations and warnings - Resource requirements and baseline period details - Alert suppression guidance - Now: Simple pointer to dedicated page with key topics listed **Threshold rule:** - Complete configuration instructions - Extensive cardinality analysis guidance - Circuit breaker troubleshooting - Group by and Count field explanations - Threshold alert structure details - Now: Simple pointer to dedicated page with key topics listed **Event correlation rule:** - Full EQL query configuration steps - Example EQL query with explanation - EQL settings configuration (event category, tiebreaker, timestamp) - Now: Simple pointer to dedicated page with key topics listed **Indicator match rule:** - Detailed configuration with threat mapping - Source and indicator index setup - MATCHES/DOES NOT MATCH conditions - Timeline templates guidance - Complete "Use value lists with indicator match rules" subsection - Now: Simple pointer to dedicated page with key topics listed **New terms rule:** - Step-by-step configuration - Fields menu selection guidance - Multi-field combination warnings - History window size explanation - Now: Simple pointer to dedicated page with key topics listed **ES|QL rule:** - Removed ALL subsections: - Query types (aggregating vs. non-aggregating) - Aggregating query example and explanation - Non-aggregating query example and explanation - Alert deduplication section with examples - Query design considerations - Rule limitations - Highlight fields guidance - Now: Simple pointer to dedicated page with key topics listed ## Result: The main create-detection-rule.md file is now significantly cleaner and more maintainable: - Reduced from ~1413 lines to ~987 lines (426 lines removed) - Maintains navigation structure with clear pointers to dedicated pages - Each rule type section now has 4-6 bullet points summarizing what's covered - All detailed content preserved in separate dedicated files - Easier to maintain and update individual rule types - Better user experience with focused, single-purpose pages

solutions/security/detect-and-alert/rule-types/machine-learning.md

…g.md

solutions/security/detect-and-alert/create-manage-value-lists.md

solutions/security/detect-and-alert/about-detection-rules.md

solutions/security/detect-and-alert/create-manage-value-lists.md

solutions/security/detect-and-alert/rule-types/custom-query.md

Fixing more ref errors

nastasha-solomon self-assigned this Nov 14, 2025

github-actions bot had a problem deploying to docs-preview November 14, 2025 19:36 Failure

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/create-detection-rule.md Outdated Show resolved Hide resolved

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/create-detection-rule.md Outdated Show resolved Hide resolved

Update solutions/security/detect-and-alert/create-detection-rule.md

0dbefe2

github-actions bot had a problem deploying to docs-preview November 14, 2025 19:46 Failure

Update solutions/security/detect-and-alert/create-detection-rule.md

3c32ec7

github-actions bot deployed to docs-preview November 14, 2025 19:51 View deployment

nastasha-solomon added 4 commits November 14, 2025 15:01

Removed duplicated content

be3182a

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:26 Failure

Update references

65d6a9c

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:33 Failure

Re-adding to pass checks

3aacaf3

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:38 Failure

Fixed refs to indicator value lists

d4421f3

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:44 Failure

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/rule-types/machine-learning.md Outdated Show resolved Hide resolved

Update solutions/security/detect-and-alert/rule-types/machine-learnin…

280dccb

…g.md

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:48 Failure

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/create-manage-value-lists.md Outdated Show resolved Hide resolved

Update solutions/security/detect-and-alert/create-manage-value-lists.md

e432deb

github-actions bot had a problem deploying to docs-preview November 14, 2025 20:50 Failure

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/about-detection-rules.md Outdated Show resolved Hide resolved

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/create-manage-value-lists.md Outdated Show resolved Hide resolved

nastasha-solomon commented Nov 14, 2025

View reviewed changes

solutions/security/detect-and-alert/rule-types/custom-query.md Outdated Show resolved Hide resolved

Apply suggestions from code review

be22d7d

Fixing more ref errors

github-actions bot deployed to docs-preview November 14, 2025 20:54 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944

[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944

Uh oh!

nastasha-solomon commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944

Are you sure you want to change the base?

[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944

Uh oh!

Conversation

nastasha-solomon commented Nov 14, 2025

Resource Planning and Performance (Critical)

Rule Type Decision Support (Critical)

Infrastructure-Focused Examples (Critical)

Enhanced Scheduling Guidance (Critical)

Improved Threshold Rule Documentation (Critical)

Enhanced ML Rule Warnings (Critical)

Max Alerts Per Run Clarification (Critical)

Improved Rule Actions/Notifications (Major)

Strengthened Response Actions Warning (Critical)

Integrated Troubleshooting Section (Critical)

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Nov 14, 2025 •

edited

Loading