Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Doc: Separation of Control and Data planes #397

Closed
Tracked by #396
morgsmccauley opened this issue Nov 15, 2023 · 0 comments
Closed
Tracked by #396

Design Doc: Separation of Control and Data planes #397

morgsmccauley opened this issue Nov 15, 2023 · 0 comments
Assignees

Comments

@morgsmccauley
Copy link
Collaborator

morgsmccauley commented Nov 15, 2023

This issue describes some potential problems within the current QueryAPI architecture and introduces a re-envisioned architecture to rectify them. The focal point of this proposal is the blurred distinction between the Data and Control Planes in the existing system. The document discusses how we can achieve a clear separation between these planes and underscores its significance.

By creating this separation at an early stage, we mitigate the potential complications and avoid a larger rework that might become of maintaining the current design.

To guide the transition, both an 'intermediate' and 'final' design have been proposed. The intermediate design offers a quick path towards realising some immediate benefits while still aligning with the final version, which provides all described benefits.

Control Plane vs. Data Plane

The control plane is responsible for decision-making in the system, determining how data should flow, and managing the system's overall configuration and behaviour. Its primary functions include: Configuration management, Resource orchestration, and Monitoring.

The data plane is responsible for processing and transporting data through the system. It deals with the actual execution of the operations that the control plane decides upon. Key characteristics include: Data processing & filtering, Data transportation, Efficiency and speed.

In essence, while the control plane decides "what" to do and "how" to do it, the data plane actually carries out those decisions by processing and transporting data.

Benefits of Distinct Planes

  1. Scalability: By decoupling decision-making from data processing, each plane can be scaled independently. As data demands increase, the data plane can be scaled out without impacting the control plane's performance.
  2. Flexibility: Changes or upgrades can be made to one plane without necessarily affecting the other. This makes it easier to introduce new features, optimisations, or patches.
  3. Optimised Performance: Each plane can be optimised for its specific set of responsibilities. The control plane can be streamlined for rapid decision-making, while the data plane can be enhanced for fast data processing and transmission.
  4. Increased Reliability: Failures in one plane can be managed without affecting the other. For instance, a data processing node in the data plane can fail without impacting the decision-making abilities of the control plane.
  5. Single Responsibility: A clear separation provides a more organised framework, making it easier to diagnose issues, manage resources, and monitor system health.
  6. Resource Efficiency: Resources can be allocated more effectively when there's a clear distinction between control and data functions. This prevents the over-provisioning of resources and ensures optimal utilisation.

Current Architecture

image

In our current architecture, Coordinator primarily functions as an Indexer. However, aspects of Control and in-memory registry maintenance have been added almost as secondary features. Core logic is integrated into stream handling, creating a dependency throughout the system. Messages, which carry both data and commands, flow in a unidirectional manner. This design necessitates that Runner also makes control decisions, given the Coordinator's limited visibility into this segment of the system.

The blending of responsibilities between the Control and Data Planes throughout our system lead to the following drawbacks:

  1. Indexer Coupling: Within Coordinator, functionality is build on top of block handling, effectively binding them by the speed of indexing. Any interruptions or slowdowns in indexing cascade into control-related delays.
  2. Command Coupling: Commands are sent via the Data Plane, making their execution dependant on its performance. Congestions in the data plane translate to lags in command execution.
  3. Open-loop Control: Coordinator has no awareness of the current state of the system. The requirement of state changes within the system, and the execution of dispatched commands, are unknown.
  4. Dual Control: Decision-making and control in the system is partitioned between both Runner and Coordinator. This can lead to overlapping directives and potential decisional conflicts.

Proposed Intermediate Architecture

This proposed architecture serves as a foundation for the final design, outlined below. While the positioning of each function largely remains the same, the distinction between Control and Data responsibilities is refined.

image

Compared to our current setup, the significant changes in each component are as follows:

Coordinator: Control of the system now takes precedence as its primary function. Real-time indexing task shifts to a secondary thread, enabling both its monitoring and management, alongside every other component in the system. The current registry is periodically fetched and compared with the system's actual state, facilitating the issuance of commands to various components and ensuring system alignment.

Registry: The requirement for constructing and maintaining an in-memory registry is no longer required, mainly due to offloading indexing to a separate process. The Registry can either be accessed via RPC or, alternatively, a dedicated indexer combined with an HTTP API can be created to expose the registry with minimal latency. Entries within the registry are version-stamped, simplifying the comparison of current and actual states.

Runner: HTTP endpoints close the loop between Runner and Coordinator. This provides direct control over Executors and Provisioning, allowing Coordinator to spin processes up/down, and provision indexers as needed. This direct control eliminates the immediate requirement to port the functionality to Coordinator.

Redis: Now exclusively aligned within the Data Plane, focusing solely on data transport between Historical/Real-time Streams and their corresponding Executors.

Transitioning to this architecture provides several advantages:

  1. Closed-loop Control: Coordinator gains full awareness of the system. It is therefore able to execute commands immediately and respond to deviations from desired states.
  2. Decoupled Data/Control Plane: Control is no longer dependant on the data plane, allowing for independent responsiveness.
  3. Throttling: With greater awareness of the entire system, Coordinator can implement throttling as needed. This is especially useful with provisioning, which has the tendency to fail during cold starts.

Proposed Final Architecture

In this final architecture, components are distinctly separated creating cohesive components, the boundaries between Control and Data Plane are further refined.

image

Compared to the intermediate architecture, the following changes have been made:

Coordinator: Central hub for control-related tasks. To increase cohesiveness, Provisioning has been ported from Runner to Coordinator. This focussed role ensures that control mechanisms are not bogged down by other tasks.

Streamers: Responsible for managing real-time and historical data streams, ensuring correct block data is directed to the relevant locations. Responsibilities include fetching index files, filtering, and more. Exposes an HTTP API, providing Coordinator with direct control.

Runner: Responsible for managing Executor processes responsible for executing users code. No longer responsible for provisioning. An HTTP API is exposed to provide direct control to Coordinator.

This proposed architecture provides the following benefits:

  1. Language Agnostic: As long as the same API is exposed, Runner can be implemented in any language. This provides an easy transition towards executing Rust based indexers.
  2. Scalability: Each component can be scaled independently, without impacting the rest of the system. Components can be scaled in the direction they need, providing them with the resources they need.
  3. Single Responsibility: Individual components work towards a cohesive goal. This creates a system which is easier to reason about and debug.
@morgsmccauley morgsmccauley changed the title Separation of Control and Data plane Separation of Control and Data planes Nov 15, 2023
@morgsmccauley morgsmccauley self-assigned this Nov 20, 2023
@pkudinov pkudinov changed the title Separation of Control and Data planes Reference: Separation of Control and Data planes Dec 12, 2023
@morgsmccauley morgsmccauley changed the title Reference: Separation of Control and Data planes Design Doc: Separation of Control and Data planes Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants