-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Problem Statement
Currently, OpenSearch PPL commands lack support for a default field concept, which creates limitations when implementing text processing and analysis commands. Many PPL commands being developed require the ability to operate on a default field when no explicit field is specified, but this functionality is not available in the current implementation.
This limitation affects multiple commands and prevents PPL from achieving a more intuitive user experience where operating on primary event data without specifying a field is a common pattern in log analysis workflows.
Current State
Affected Commands
- rex/regex commands: Requires extract patterns from raw events without explicit field specification
- Other text processing commands in development
Current Behavior
# This doesn't work - no default field to operate on
source=logs | rex "(?<error>ERROR: .*)"
# Must explicitly specify field every time
source=logs | rex field=message "(?<error>ERROR: .*)"User Impact
- Verbose query syntax requiring repeated field specifications
- Unable to process raw log data directly
Proposed Solution
1. Introduce Default Field Concept
Implement a configurable default field (e.g., _source or message) that commands can use when no field is explicitly specified.
2. Command Support
Update commands to check for and use the default field when no field parameter is provided:
# When no field specified, use default field
source=logs | rex "(?<level>ERROR|WARN|INFO)"
# Explicit field still works
source=logs | rex field=custom_field "(?<level>ERROR|WARN|INFO)"3. Configuration Options
- Cluster-level / Index-level configuration for default field name
- Fallback chain (e.g., try
_raw, thenmessage, then first text field)
Use Cases
Log Analysis
# Extract log level from raw events
source=application_logs | rex "(?<level>\\w+):\\s+(?<msg>.*)"
# Parse structured logs without field specification
source=apache_logs | rex "(?<ip>\\d+\\.\\d+\\.\\d+\\.\\d+).*\\[(?<timestamp>[^\\]]+)\\]"Security Analysis
# Extract security events from raw logs
source=security_logs | regex "failed.*authentication" | rex "user\\s+(?<user>\\w+)"Technical Considerations
1. Field Resolution Strategy
- Check if field parameter is provided
- If not, look for configured default field
- If default field doesn't exist, return appropriate error
2. Backward Compatibility
- Existing queries with explicit fields must continue working
- Default behavior should not break current implementations
Benefits
- Improved Usability: Simpler, more intuitive query syntax
- Reduced Verbosity: Cleaner queries for common use cases
- Consistency: Uniform behavior across text processing commands
Risks and Mitigation
Risk 1: Ambiguous Field Resolution
Mitigation: Clear precedence rules and error messages
Risk 2: Breaking Changes
Mitigation: Optional feature with explicit opt-in
Risk 3: Performance Overhead
Mitigation: Compile-time resolution, no runtime cost
Success Criteria
- Default field configuration available in PPL settings
- Rex/regex commands work without explicit field parameter
- Parse command supports default field
- No performance regression in existing queries
- Documentation updated with examples
- Migration guide available
Related Issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status