-
Notifications
You must be signed in to change notification settings - Fork 176
[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nastasha-solomon
wants to merge
13
commits into
main
Choose a base branch
from
docs-improve-security-rules-devops-persona
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[TEST PR][DO NOT MERGE] Improve security detection rules documentation for DevOps engineers #3944
nastasha-solomon
wants to merge
13
commits into
main
from
docs-improve-security-rules-devops-persona
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit implements critical improvements to the 'Create a detection rule' documentation based on a comprehensive usability review from the DevOps engineer persona perspective. Key improvements: ## Resource Planning and Performance (Critical) - Added 'Resource planning and performance considerations' section with: - Detailed resource requirements by rule type (execution time, memory) - Capacity planning guidance for running multiple rules - Circuit breaker prevention and troubleshooting - Guidance on staggering rule activation to prevent thundering herd ## Rule Type Decision Support (Critical) - Added 'Understanding rule types' comparison table - Clear guidance on when to use each rule type - Query language quick reference (KQL vs Lucene vs EQL vs ES|QL) - Recommendation: 90% of use cases use Custom Query + KQL ## Infrastructure-Focused Examples (Critical) - Added practical examples for DevOps use cases: - Detect failed SSH login attempts - Detect unusual outbound network connections - Each example includes prerequisites, testing steps, expected behavior, and tuning tips ## Enhanced Scheduling Guidance (Critical) - Reframed 'Additional look-back time' as CRITICAL not optional - Explained three failure scenarios (execution delay, ingestion delay, Kibana restarts) - Added scheduling strategy for multiple rules with load distribution - Performance-based interval recommendations by rule type ## Improved Threshold Rule Documentation (Critical) - Added specific cardinality definitions (low/medium/high risk levels) - Provided diagnostic query to check cardinality before creating rule - Explained circuit breaker error messages with exact text users will see - Step-by-step resolution procedures ## Enhanced ML Rule Warnings (Critical) - Added comprehensive warning about ML job startup (30-60s delay) - Resource requirements (2GB RAM per job) - Baseline period explanation (7-14 days for learning) - Production deployment best practices ## Max Alerts Per Run Clarification (Critical) - Explained that rule STOPS processing when limit reached (not just warning) - Added detection methods for when limit is hit - Performance impact data (100ms per 100 alerts) - Decision framework for appropriate values ## Improved Rule Actions/Notifications (Major) - Clear licensing requirements (Gold+ for Stack, included in Serverless) - Common notification patterns (severity-based routing, on-call integration) - Action reliability and failure handling (3 retries, then dropped) - How to diagnose failed notifications ## Strengthened Response Actions Warning (Critical) - Elevated to comprehensive warning with real-world failure scenarios - Three-phase safe deployment process (notifications → manual → limited automation) - 'Never automate response for' list (prod databases, k8s masters, CI/CD) - Required safeguards and emergency rollback procedure ## Integrated Troubleshooting Section (Critical) - Added 'Common issues after creating rules' section at end of document - Six common problems with diagnosis and solutions: - Rule shows Warning status - Rule creates zero alerts - Too many alerts - Gaps in rule execution - Actions not sending - Performance degradation over time - Links to additional troubleshooting resources These changes address the top 7 critical issues and 5 major issues identified in the usability review, significantly improving the documentation for operations teams managing security detection rules at scale.
This commit restructures the rule type documentation for better maintainability and navigation: ## Changes **New files created in rule-types/ directory:** - custom-query.md: Detailed instructions for creating custom query rules - Includes two practical infrastructure examples (SSH login failures, unusual outbound connections) - Performance guidance and use case recommendations - machine-learning.md: Complete ML rule creation guide - ML job startup considerations and resource requirements - Baseline period guidance and production best practices - threshold.md: Comprehensive threshold rule documentation - Detailed cardinality analysis and risk levels - Circuit breaker error troubleshooting with specific error messages - Diagnostic query examples **Updated main create-detection-rule.md:** - Added 'Create rules by type' navigation section with links to all rule types - Added 'Quick reference: When to use each rule type' with practical examples - Kept summaries of custom query, ML, and threshold rules with links to detailed pages - Full content remains for event correlation, indicator match, new terms, and ES|QL rules (these will be extracted in future commits) **Updated solutions/toc.yml:** - Added rule-types/custom-query.md - Added rule-types/machine-learning.md - Added rule-types/threshold.md - Positioned as children of create-detection-rule.md ## Benefits - **Better organization**: Each rule type now has its own focused documentation - **Easier maintenance**: Changes to one rule type don't affect others - **Improved navigation**: Users can directly access the rule type they need - **Preserved content**: All existing content maintained, just reorganized - **Scalable structure**: Easy to add more rule-specific guidance in the future
…, new-terms, and esql This commit completes the refactoring by extracting the remaining four rule types into separate dedicated files for better organization and maintainability. ## New files created in rule-types/ directory: **event-correlation.md:** - Complete guide for EQL-based event correlation rules - Instructions for detecting sequences of related events - EQL settings configuration (event category field, tiebreaker, timestamp) - Use case: Detect attack patterns involving multiple steps - Performance: Medium (~200-400ms per execution) **indicator-match.md:** - Comprehensive indicator match rule documentation - Threat intelligence feed integration guidance - Detailed threat mapping configuration (MATCHES/DOES NOT MATCH) - Section on using value lists with indicator match rules - Performance: High (~500ms-2s), limit to 15-minute intervals - Recommendation: Keep indicator count under 10,000 **new-terms.md:** - New terms rule creation guide - First-time occurrence detection documentation - History window size configuration - Multi-field combination support (up to 3 fields) - Important note about field array cardinality (100 combination limit) - Performance: Medium (~300ms per execution) **esql.md:** - Complete ES|QL rule documentation - Detailed coverage of aggregating vs. non-aggregating queries - Alert deduplication configuration (METADATA fields) - Query design considerations (LIMIT, STATS...BY, sorting) - Rule limitations and workarounds - Custom highlighted fields guidance - Performance: Variable based on query complexity ## Updated create-detection-rule.md: - Updated 'Create rules by type' navigation to link to all seven rule types - Added summary notes for each rule type pointing to dedicated pages - Updated internal links for ES|QL query types, design considerations, and limitations - Fixed alert deduplication link to point to new esql.md file - Maintained backward compatibility with existing content ## Updated solutions/toc.yml: - Added all four new rule type files as children of create-detection-rule.md - Complete list now includes: - custom-query.md - machine-learning.md - threshold.md - event-correlation.md - indicator-match.md - new-terms.md - esql.md ## Benefits: ✅ **Complete separation**: All rule types now have dedicated, focused documentation ✅ **Better discoverability**: Users can navigate directly to the rule type they need ✅ **Easier maintenance**: Rule type updates are isolated and don't affect other types ✅ **Scalable structure**: Easy to add rule-specific examples, troubleshooting, and best practices ✅ **Preserved content**: All existing content maintained, just better organized ✅ **Improved navigation**: Clear hierarchy in TOC for all detection rule types
This commit removes detailed rule creation instructions from create-detection-rule.md, now that all rule types have been extracted into separate dedicated files. ## What was removed: **Custom query rule:** - Detailed step-by-step instructions - Two infrastructure examples (SSH login failures, unusual outbound connections) - Testing and tuning guidance - Now: Simple pointer to dedicated page with summary bullet points **Machine learning rule:** - Full configuration steps - ML job startup considerations and warnings - Resource requirements and baseline period details - Alert suppression guidance - Now: Simple pointer to dedicated page with key topics listed **Threshold rule:** - Complete configuration instructions - Extensive cardinality analysis guidance - Circuit breaker troubleshooting - Group by and Count field explanations - Threshold alert structure details - Now: Simple pointer to dedicated page with key topics listed **Event correlation rule:** - Full EQL query configuration steps - Example EQL query with explanation - EQL settings configuration (event category, tiebreaker, timestamp) - Now: Simple pointer to dedicated page with key topics listed **Indicator match rule:** - Detailed configuration with threat mapping - Source and indicator index setup - MATCHES/DOES NOT MATCH conditions - Timeline templates guidance - Complete "Use value lists with indicator match rules" subsection - Now: Simple pointer to dedicated page with key topics listed **New terms rule:** - Step-by-step configuration - Fields menu selection guidance - Multi-field combination warnings - History window size explanation - Now: Simple pointer to dedicated page with key topics listed **ES|QL rule:** - Removed ALL subsections: - Query types (aggregating vs. non-aggregating) - Aggregating query example and explanation - Non-aggregating query example and explanation - Alert deduplication section with examples - Query design considerations - Rule limitations - Highlight fields guidance - Now: Simple pointer to dedicated page with key topics listed ## Result: The main create-detection-rule.md file is now significantly cleaner and more maintainable: - Reduced from ~1413 lines to ~987 lines (426 lines removed) - Maintains navigation structure with clear pointers to dedicated pages - Each rule type section now has 4-6 bullet points summarizing what's covered - All detailed content preserved in separate dedicated files - Easier to maintain and update individual rule types - Better user experience with focused, single-purpose pages
solutions/security/detect-and-alert/rule-types/machine-learning.md
Outdated
Show resolved
Hide resolved
solutions/security/detect-and-alert/create-manage-value-lists.md
Outdated
Show resolved
Hide resolved
solutions/security/detect-and-alert/create-manage-value-lists.md
Outdated
Show resolved
Hide resolved
Fixing more ref errors
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit implements critical improvements to the 'Create a detection rule' documentation based on a comprehensive usability review from the DevOps engineer persona perspective.
Key improvements:
Resource Planning and Performance (Critical)
Rule Type Decision Support (Critical)
Infrastructure-Focused Examples (Critical)
Enhanced Scheduling Guidance (Critical)
Improved Threshold Rule Documentation (Critical)
Enhanced ML Rule Warnings (Critical)
Max Alerts Per Run Clarification (Critical)
Improved Rule Actions/Notifications (Major)
Strengthened Response Actions Warning (Critical)
Integrated Troubleshooting Section (Critical)
These changes address the top 7 critical issues and 5 major issues identified in the usability review, significantly improving the documentation for operations teams managing security detection rules at scale.