-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Problem Statement
Except for actual API changes or new physical execution operators, updating the PPL version for an OpenSearch release should be as easy as updating a version number in
build.gradle.
OpenSearch PPL is being passed around in more environments, but wasn't originally designed to be used outside of an OpenSearch cluster.
It's currently being copied between:
- Multiple OpenSearch versions. See the current 2.19-dev backporting mess, and the managed service releases overall.
- Multiple environments. We now see PPL support on Serverless and in CloudWatch Log Insights.
- Multiple execution engines. We support OpenSearch Spark, a new CLI that runs the plugin independently of a cluster, and OpenSearch core is floating around ideas of reusing parts of our code for integration with other execution engines (ref: [RFC] OpenSearch Execution Engine OpenSearch#18416).
The maintenance costs of supporting all these environments is severe. Many updates can take days or weeks (as @RyanL1997 can confirm) to port to these alternative systems. It also introduces a large barrier for testing, as many of these tests are tightly coupled with the properties of the cluster we're running them on.
In addition, this tight integration with OpenSearch means there's limited hope of supporting PPL for any alternative databases.
As such: we should decouple the internals of SQL/PPL planning and execution engine from OpenSearch.
Current State
I don't think this is as far off today as it was a year ago. We already did some work on publishing internal modules for reuse by others with #3763. Experience with the above migration efforts have already taught us some of the major areas we need to make generic. Extracting out the async-query and data source internals (ref: #4229) is removing a lot of our messier integration quirks.
The remaining issue is that the SQL plugin itself is still linking directly into these modules instead of depending on the unified library. This means we don't have a strong guarantee of usability of the library from the outside, and it also means that our plugin version is still tightly coupled to the current version language internals.
We also still have the unified PPL depending directly on OpenSearch, which is a gnarly dependency to keep in a jar we want to redistribute. We should try to break out usage of OpenSearch internals and supply our own interfaces, and let the Plugin do that mediation work.
Long-term Goals
- Split the PPL language version from the OpenSearch SQL plugin version. Except for actual API changes or new physical execution operators, updating the PPL version for an OpenSearch release should be as easy as updating a version number in
build.gradle. - Simplify language testing. If the language library has dedicated testing outside of an OpenSearch context, it frees us to save redoing all this testing effort for new deployment environments, and potentially porting hundreds of tests. The only tests on the plugin should be I/O related and cover things that can't be reasonably separated from OpenSearch.
- Simplify building applications with PPL. The current migrations of PPL to other environments have historically required hacking at deep internals. The SQL CLI took weeks of effort to figure out how to integrate, to say nothing of many internal projects.
- Reduce the maintenance overhead for individual language changes in backporting.
- Enable running PPL in more environments outside of OpenSearch. For example, hosting it on a stateless web server which can query many clusters. This would be arbitrarily scalable, compared to the current OpenSearch cluster model. It would also make it easier to foresee using PPL for large-scale cross-platform analytics.
- Support multiple execution modes. We've discussed previously implementing PPL directly on Lucene without going through an OpenSearch client. This would have clear performance benefits, but would also break the other use cases like Serverless and the CLI. Splitting out the OpenSearch integration would allow us to supply alternative I/O clients for different environments, and pick the best one on a case-by-case basis.
- This might also support a future where PPL is built in to OpenSearch core.
Proposal
Turn the language internals into a library, which doesn't explicitly depend on OpenSearch. Rewrite the plugin to only depend on this library, and supply any OpenSearch context from the outside via the plugin.
Approach
Essentially:
- Take all of the OpenSearch API access out of the language library. Have them be supplied through an externally-implemented interface. Currently, the best candidate I see is OpenSearchClient, but I think this could be simplified or extended.
- We would also want to separate out OpenSearch node-internal concepts such as thread pool executors, and supply those as settings as well.
- We need to document what the API boundaries are and how to satisfy them, else steps 2 and 4 will be painful.
- Take all of the library internals access out of the OpenSearch plugin. Only depend on the unified library as a distribution.
- Update our testing to better respect this boundary. We should be able to preserve most of our current integration tests by supplying the plugin's Client and calling the library with it directly. This would make future migration work boil down to "as long as our Client is correct, the language will work."
- Migrate all these other systems to use the library.
Alternative
I don't think continuing to manually port all of this between versions is sustainable long-term.
Implementation Discussion
Todo (publishing this for initial feedback)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status