forked from doocs/advanced-java
-
Notifications
You must be signed in to change notification settings - Fork 0
Design a simplified pipeline framework #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
312268112
wants to merge
9
commits into
main
Choose a base branch
from
cursor/design-a-simplified-pipeline-framework-a65f
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Design a simplified pipeline framework #5
312268112
wants to merge
9
commits into
main
from
cursor/design-a-simplified-pipeline-framework-a65f
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: 13585811473 <13585811473@163.com>
|
Cursor Agent can help with this pull request. Just |
Co-authored-by: 13585811473 <13585811473@163.com>
- Replace ASCII art diagrams with Mermaid syntax for better compatibility - Convert architecture layering diagram to Mermaid graph - Convert StreamGraph and JobGraph to Mermaid flowcharts - Convert backpressure execution flow to Mermaid flowchart - Keep sequence diagrams in Mermaid format (already converted) - Use monochrome color scheme (black and white) for all diagrams - Ensure diagrams render correctly on GitHub, GitLab, and Wiki platforms
Major Changes: - Adopt Spring Project Reactor as the core reactive framework - Native backpressure support through Reactor - Simplify all diagrams for better clarity - Add database schema design (job definition, execution, checkpoint) - Add Job state machine diagram - Add detailed module descriptions - Restructure document with more focused sections - Remove complex sequence diagrams - Add practical code examples with Reactor Technical Stack: - Spring Boot - Project Reactor (reactive streams + backpressure) - Spring Data JPA - RocksDB (state backend) - Apache Calcite (optional, for SQL) Key Features: - Reactive programming model - Built-in backpressure control - Unified API for streaming and batch - Simple and professional diagrams - Clear module responsibilities
Major Improvements: 1. Add comprehensive module documentation - Each module has clear responsibilities - Detailed core interfaces and implementation points - Code examples for each module 2. Add complete Checkpoint mechanism design - Checkpoint architecture and workflow - Three execution mode checkpoint strategies - Recovery process and cleanup policies - Detailed sequence diagrams 3. Add complete database schema design - 5 core tables with detailed field descriptions - Sample JSON data structures - ER diagram showing relationships - Index design for performance 4. Add developer guide - How to develop custom Source/Sink connectors - How to add custom operators - Local development environment setup - Step-by-step examples 5. Improved diagrams - Simpler and more focused diagrams - Clear module boundaries - Professional state machines - Deployment architecture 6. Better document structure - Table of contents for easy navigation - Quick start section for new developers - Clear core concepts explanation - Deployment and operations guide The document is now ready for new developers to understand and start coding immediately.
Major Changes: 1. Add complete module overview section - List all 10 core modules upfront - Each module includes: function, significance, and challenges - Detailed explanation of key difficulties 2. Remove all emoji symbols - Clean professional documentation - Use text descriptions only 3. Improve diagram design - More complex and information-rich diagrams - Better use of canvas space - More branching, less linear chains - Diagrams only for complex concepts 4. Enhanced diagrams include: - Trigger mechanism with multiple branches - Complete state machine with all transitions - Backpressure strategy comparison - Checkpoint consistency coordination - Full execution flow with error handling 5. Better structure - Core modules overview at beginning - Detailed design for critical modules - Complete database schema with comments - Practical development guide Document size: 843 KB Modules covered: 10 core modules Diagrams: 12 focused diagrams Code examples: 15+ practical examples
Major Changes: 1. Redesign core concepts following Flink patterns - Source: Data source abstraction with checkpoint support - Operator: Stream operator abstraction (stateful/stateless) - Sink: Data sink abstraction with 2PC support - Job: Complete data pipeline definition - StreamGraph: Logical execution graph - JobGraph: Physical execution graph with operator chaining - State: State management with RocksDB backend - Checkpoint: Barrier-based checkpoint mechanism 2. Each concept includes: - Definition and purpose - Design challenges and solutions - Complete interface design - Implementation examples - Reactor integration approach 3. Core features: - Three source patterns: Streaming/Roller/SQL - Operator types: Stateless/KeyedState/OperatorState - Operator chain optimization - Two-phase commit sink for Exactly-Once - Barrier-based checkpoint coordination - RocksDB state backend 4. Reactor integration: - Source to Flux conversion - Operator to Flux transformation - Sink to Mono subscription - Complete pipeline assembly Design philosophy: Flink abstractions + Reactor implementation
Co-authored-by: 13585811473 <13585811473@163.com>
Co-authored-by: 13585811473 <13585811473@163.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduces a comprehensive design document for a single-instance pipeline framework.
This document details the architecture, core modules, and a callback-based, asynchronous backpressure mechanism across three distinct execution modes: STREAMING, BATCH_ROLLER (paginated batch), and SQL_TASK (direct SQL execution). It addresses the requirement for Jobs to be the minimal execution unit within a single instance and defines SQL tasks as a standalone execution flow, not a source-flow-sink pipeline.