-
Notifications
You must be signed in to change notification settings - Fork 181
Description
RFC: Create OpenSearch Direct Query Plugin
Overview
This RFC proposes the creation of a new OpenSearch plugin repository called direct-query that will enable OpenSearch to interact with external data sources beyond the native OpenSearch indices. This plugin will provide a unified interface for not only querying data but also managing resources in various data sources including Prometheus, Amazon S3, and support extensibility for custom data source implementations. The plugin will support full CRUD (Create, Read, Update, Delete) operations on datasource-specific resources such as alerts, metrics, configurations, and more.
Motivation
Currently, the OpenSearch SQL plugin contains mixed responsibilities: SQL query engine functionality and data source connectivity. This coupling creates several challenges:
- Tight Coupling: Data source implementations are tightly coupled with the SQL engine, making it difficult to maintain and extend
- Limited Extensibility: Adding new data sources requires modifying the core SQL plugin
- Code Complexity: The SQL plugin has grown large and complex with multiple concerns
- Reusability: Data source connectivity cannot be easily reused by other OpenSearch components
- Limited Resource Management: Current implementation focuses only on querying data, not managing datasource resources like alerts, rules, or configurations
- No Unified API: Each datasource requires custom implementations for resource operations
Proposed Solution
Summary
Solution Overview: This RFC proposes creating a new OpenSearch direct-query plugin by extracting data source connectivity from the SQL plugin, implementing a handler-based architecture, and establishing a complete migration plan to transform OpenSearch's external data interaction capabilities.
Proposed changes:
- New repository: Create
opensearch-project/direct-queryrepository with extracted modules (direct-query-core, async-query, datasources, connectors) - Handler-based architecture: Three specialized interfaces - QueryHandler (data access), ReadResourcesHandler (resource reading), WriteResourcesHandler (resource management)
- Comprehensive functionality: Full CRUD operations on both data queries AND external resources (Prometheus alerts, S3 bucket policies, database configurations)
- Clean separation: SQL plugin focuses solely on SQL/PPL parsing/execution, delegates all external operations to direct-query plugin
- Extensible connector system: Simple API for third-party developers to create connectors with unified query and resource management capabilities
- REST API framework: Complete API endpoints for data source management, query execution, and resource operations
Migration Plan (5 Phases):
- Repository Setup: Create new repo, CI/CD pipelines, build system
- Code Migration: Extract direct-query, async-query, and datasources modules from SQL plugin with backward compatibility
- Refactoring: Implement handler interfaces, create connector abstraction layer, establish plugin integration
- Integration: Update SQL plugin to use direct-query plugin, comprehensive testing and validation
- Release: Beta release, community feedback, GA release
Key Benefits:
- Separation of Concerns: Clear boundary between query engine and data source connectivity
- Unified Resource Management: Single API for managing resources across all data sources
- Extensibility: Easy addition of new data sources and resource types without modifying core plugins
- Maintainability: Smaller, focused codebases easier to maintain and evolve
- Reusability: Direct query and resource management capabilities available to other OpenSearch components
- Community Contributions: Lower barrier for contributing new connectors with both query and resource support
- Performance: Optimized execution for both queries and resource operations specific to each data source
- Operational Efficiency: Manage external resources directly from OpenSearch without switching tools
Repository Structure
Create a new opensearch-project/direct-query repository that will:
-
Extract from SQL Plugin:
- Move
direct-queryanddirect-query-coremodules from the SQL plugin - Migrate
async-queryandasync-query-coremodules - Transfer
datasourcesmodule for data source management - Include recent Prometheus integration work (PRs
#3440and#3441)
- Move
-
Core Components:
direct-query/ ├── direct-query-core/ # Core query interfaces and abstractions ├── async-query-core/ # Async query execution framework ├── async-query/ # OpenSearch-specific async implementations ├── datasources/ # Data source management ├── connectors/ # Built-in connectors │ ├── prometheus/ │ ├── s3/ │ └── ... └── plugin/ # OpenSearch plugin integration
Architecture
Core Interfaces
-
DataSourceEngine Interface:
public interface DataSourceEngine { // Connect to external data source DataSourceConnection connect(DataSourceConfig config); // Get handler for query operations QueryHandler getQueryHandler(); // Get handler for read resource operations ReadResourcesHandler getReadResourcesHandler(); // Get handler for write resource operations WriteResourcesHandler getWriteResourcesHandler(); // Schema discovery Schema discoverSchema(DataSourceConfig config); // Health check HealthStatus checkHealth(); }
-
QueryHandler Interface:
public interface QueryHandler { // Execute synchronous query QueryResult executeQuery(Query query, DataSourceConnection connection); // Execute asynchronous query CompletableFuture<QueryResult> executeAsyncQuery(Query query, DataSourceConnection connection); // Validate query syntax ValidationResult validateQuery(Query query); // Get query capabilities QueryCapabilities getCapabilities(); }
-
ReadResourcesHandler Interface:
public interface ReadResourcesHandler { // List available resource types Set<ResourceType> getSupportedResourceTypes(); // List resources of a specific type ResourceList listResources(ResourceType type, ResourceFilter filter, DataSourceConnection connection); // Get a specific resource Resource getResource(ResourceType type, String resourceId, DataSourceConnection connection); // Search resources with advanced filters ResourceSearchResult searchResources(ResourceType type, SearchQuery query, DataSourceConnection connection); // Get resource metadata ResourceMetadata getResourceMetadata(ResourceType type, String resourceId, DataSourceConnection connection); }
-
WriteResourcesHandler Interface:
public interface WriteResourcesHandler { // Create a new resource Resource createResource(ResourceType type, ResourceDefinition definition, DataSourceConnection connection); // Update an existing resource Resource updateResource(ResourceType type, String resourceId, ResourceDefinition definition, DataSourceConnection connection); // Delete a resource void deleteResource(ResourceType type, String resourceId, DataSourceConnection connection); // Bulk operations BulkOperationResult bulkCreate(ResourceType type, List<ResourceDefinition> definitions, DataSourceConnection connection); BulkOperationResult bulkUpdate(ResourceType type, Map<String, ResourceDefinition> updates, DataSourceConnection connection); BulkOperationResult bulkDelete(ResourceType type, List<String> resourceIds, DataSourceConnection connection); // Validate resource definition before write ValidationResult validateResourceDefinition(ResourceType type, ResourceDefinition definition); }
-
ResourceType Enum (Examples):
public enum ResourceType { // Prometheus resources ALERT_RULE, RECORDING_RULE, SILENCE, // S3 resources BUCKET_POLICY, LIFECYCLE_RULE, // Generic CONFIGURATION, PERMISSION, CUSTOM }
-
Query Interface:
public interface Query { String getQueryString(); Map<String, Object> getParameters(); QueryType getType(); // SQL, PPL, NATIVE, etc. TimeRange getTimeRange(); }
-
Extensibility API:
public interface DataSourceConnector { // Unique identifier for the connector String getType(); // Create engine instance DataSourceEngine createEngine(ConnectorConfig config); // Supported query types Set<QueryType> getSupportedQueryTypes(); // Supported resource operations Set<ResourceType> getSupportedResourceTypes(); // Configuration schema ConfigSchema getConfigurationSchema(); }
Key Features
-
Plugin Architecture:
- Independent OpenSearch plugin deployable alongside SQL plugin
- RESTful APIs for data source management, query execution, and resource operations
- Integration with OpenSearch security and access control
- Unified interface for both data access and resource management
-
Built-in Connectors with Resource Management:
- Prometheus: Time-series queries (PromQL) + Alert rules, recording rules, and silences management
- Amazon S3: Query structured data (Parquet, JSON, CSV) + Bucket policies and lifecycle rules
- JDBC: Generic database connectivity with configuration management (future)
-
Comprehensive Resource Operations:
- Full CRUD operations on datasource-specific resources
- Bulk operations for efficient resource management
- Resource filtering and search capabilities
- Resource versioning and change tracking
-
Async Query Support:
- Long-running query execution
- Result caching and pagination
- Query status tracking and cancellation
- Background resource synchronization
-
Developer Experience:
- Simple connector development API with resource management interfaces
- Maven/Gradle artifacts for third-party connector development
- Comprehensive documentation with examples for both queries and resources
- Type-safe resource definitions and operations
API Design
REST Endpoints
# Data source management
PUT /_plugins/_direct_query/datasource/{name}
GET /_plugins/_direct_query/datasource/{name}
DELETE /_plugins/_direct_query/datasource/{name}
GET /_plugins/_direct_query/datasource
# Query execution
POST /_plugins/_direct_query/_execute
{
"datasource": "my-prometheus",
"query": "rate(http_requests_total[5m])",
"format": "json"
}
# Async query
POST /_plugins/_direct_query/_async_execute
GET /_plugins/_direct_query/_async_query/{query_id}
DELETE /_plugins/_direct_query/_async_query/{query_id}
# Schema discovery
GET /_plugins/_direct_query/datasource/{name}/_schema
# Resource management operations
GET /_plugins/_direct_query/datasource/{name}/resources/{type}
POST /_plugins/_direct_query/datasource/{name}/resources/{type}/_search
GET /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
PUT /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
POST /_plugins/_direct_query/datasource/{name}/resources/{type}
DELETE /_plugins/_direct_query/datasource/{name}/resources/{type}/{id}
# Bulk resource operations
POST /_plugins/_direct_query/datasource/{name}/resources/{type}/_bulk
{
"operations": [
{"action": "create", "definition": {...}},
{"action": "update", "id": "resource1", "definition": {...}},
{"action": "delete", "id": "resource2"}
]
}
Integration with SQL Plugin
The SQL plugin will be refactored to:
- Remove data source-specific code
- Depend on direct-query plugin for external data source queries
- Focus on SQL/PPL parsing, planning, and execution
- Delegate external queries to direct-query plugin via well-defined interfaces
// SQL Plugin integration
public class DirectQueryStorageEngine implements StorageEngine {
private final DirectQueryClient client;
@Override
public Table getTable(DataSourceSchemaName dataSourceSchemaName, String tableName) {
return client.getTable(dataSourceSchemaName, tableName);
}
}Migration Plan
Phase 1: Repository Setup
- Create new repository
opensearch-project/direct-query - Set up CI/CD pipelines
- Establish code structure and build system
Phase 2: Code Migration
- Extract direct-query modules from SQL plugin
- Move async-query modules
- Migrate datasources module
- Ensure backward compatibility
Phase 3: Refactoring
- Define clean interfaces and APIs
- Refactor existing code to new architecture
- Create connector abstraction layer
- Implement plugin integration
Phase 4: Integration
- Update SQL plugin to use direct-query plugin
- Testing and validation
- Documentation updates
- Performance optimization
Phase 5: Release
- Beta release with core functionality
- Gather feedback from community
- Address issues and improvements
- GA release
Risks and Mitigation
| Risk | Mitigation |
|---|---|
| Breaking changes for existing users | Maintain backward compatibility layer during transition |
| Increased deployment complexity | Provide clear migration guides and tooling |
| Performance overhead from plugin communication | Optimize inter-plugin communication, consider native integration |
| Connector quality variance | Establish certification program and quality standards |
Open Questions
- Should the direct-query plugin be required for SQL plugin operation or optional?
- How to handle version compatibility between SQL and direct-query plugins?
- Should we support federation queries across multiple data sources?
- What level of SQL/PPL support should each connector provide?
References
- OpenSearch SQL Plugin
- Async Query Design
- Prometheus Integration PR (to be linked)
- Apache Calcite - Query optimization framework
Conclusion
The proposed OpenSearch Direct Query Plugin represents a significant architectural improvement that will transform how OpenSearch interacts with external data sources. By introducing a handler-based architecture with separate QueryHandler, ReadResourcesHandler, and WriteResourcesHandler interfaces, this plugin will provide a unified, extensible framework for both data querying and comprehensive resource management across diverse data sources.
This separation of concerns will not only simplify the SQL plugin's architecture but also create new opportunities for the OpenSearch ecosystem. The plugin will enable developers to build rich connectors that go beyond simple data access to provide full lifecycle management of external resources like Prometheus alerts and S3 policies. The comprehensive example implementations demonstrate how this architecture can be practically applied to real-world data sources.
The direct-query plugin will serve as a foundation for OpenSearch's evolution into a unified data platform that can seamlessly integrate with and manage resources across the entire data infrastructure landscape, while maintaining the simplicity and extensibility that developers expect from the OpenSearch ecosystem.
This RFC is open for community feedback. Please comment with your thoughts, concerns, and suggestions.