Skip to content

Conversation

@312268112
Copy link
Owner

Introduces a comprehensive design document for a single-instance pipeline framework.

This document details the architecture, core modules, and a callback-based, asynchronous backpressure mechanism across three distinct execution modes: STREAMING, BATCH_ROLLER (paginated batch), and SQL_TASK (direct SQL execution). It addresses the requirement for Jobs to be the minimal execution unit within a single instance and defines SQL tasks as a standalone execution flow, not a source-flow-sink pipeline.


Open in Cursor Open in Web

Co-authored-by: 13585811473 <13585811473@163.com>
@cursor
Copy link

cursor bot commented Nov 7, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

cursoragent and others added 8 commits November 7, 2025 07:24
Co-authored-by: 13585811473 <13585811473@163.com>
- Replace ASCII art diagrams with Mermaid syntax for better compatibility
- Convert architecture layering diagram to Mermaid graph
- Convert StreamGraph and JobGraph to Mermaid flowcharts
- Convert backpressure execution flow to Mermaid flowchart
- Keep sequence diagrams in Mermaid format (already converted)
- Use monochrome color scheme (black and white) for all diagrams
- Ensure diagrams render correctly on GitHub, GitLab, and Wiki platforms
Major Changes:
- Adopt Spring Project Reactor as the core reactive framework
- Native backpressure support through Reactor
- Simplify all diagrams for better clarity
- Add database schema design (job definition, execution, checkpoint)
- Add Job state machine diagram
- Add detailed module descriptions
- Restructure document with more focused sections
- Remove complex sequence diagrams
- Add practical code examples with Reactor

Technical Stack:
- Spring Boot
- Project Reactor (reactive streams + backpressure)
- Spring Data JPA
- RocksDB (state backend)
- Apache Calcite (optional, for SQL)

Key Features:
- Reactive programming model
- Built-in backpressure control
- Unified API for streaming and batch
- Simple and professional diagrams
- Clear module responsibilities
Major Improvements:
1. Add comprehensive module documentation
   - Each module has clear responsibilities
   - Detailed core interfaces and implementation points
   - Code examples for each module

2. Add complete Checkpoint mechanism design
   - Checkpoint architecture and workflow
   - Three execution mode checkpoint strategies
   - Recovery process and cleanup policies
   - Detailed sequence diagrams

3. Add complete database schema design
   - 5 core tables with detailed field descriptions
   - Sample JSON data structures
   - ER diagram showing relationships
   - Index design for performance

4. Add developer guide
   - How to develop custom Source/Sink connectors
   - How to add custom operators
   - Local development environment setup
   - Step-by-step examples

5. Improved diagrams
   - Simpler and more focused diagrams
   - Clear module boundaries
   - Professional state machines
   - Deployment architecture

6. Better document structure
   - Table of contents for easy navigation
   - Quick start section for new developers
   - Clear core concepts explanation
   - Deployment and operations guide

The document is now ready for new developers to understand and start coding immediately.
Major Changes:
1. Add complete module overview section
   - List all 10 core modules upfront
   - Each module includes: function, significance, and challenges
   - Detailed explanation of key difficulties

2. Remove all emoji symbols
   - Clean professional documentation
   - Use text descriptions only

3. Improve diagram design
   - More complex and information-rich diagrams
   - Better use of canvas space
   - More branching, less linear chains
   - Diagrams only for complex concepts

4. Enhanced diagrams include:
   - Trigger mechanism with multiple branches
   - Complete state machine with all transitions
   - Backpressure strategy comparison
   - Checkpoint consistency coordination
   - Full execution flow with error handling

5. Better structure
   - Core modules overview at beginning
   - Detailed design for critical modules
   - Complete database schema with comments
   - Practical development guide

Document size: 843 KB
Modules covered: 10 core modules
Diagrams: 12 focused diagrams
Code examples: 15+ practical examples
Major Changes:
1. Redesign core concepts following Flink patterns
   - Source: Data source abstraction with checkpoint support
   - Operator: Stream operator abstraction (stateful/stateless)
   - Sink: Data sink abstraction with 2PC support
   - Job: Complete data pipeline definition
   - StreamGraph: Logical execution graph
   - JobGraph: Physical execution graph with operator chaining
   - State: State management with RocksDB backend
   - Checkpoint: Barrier-based checkpoint mechanism

2. Each concept includes:
   - Definition and purpose
   - Design challenges and solutions
   - Complete interface design
   - Implementation examples
   - Reactor integration approach

3. Core features:
   - Three source patterns: Streaming/Roller/SQL
   - Operator types: Stateless/KeyedState/OperatorState
   - Operator chain optimization
   - Two-phase commit sink for Exactly-Once
   - Barrier-based checkpoint coordination
   - RocksDB state backend

4. Reactor integration:
   - Source to Flux conversion
   - Operator to Flux transformation
   - Sink to Mono subscription
   - Complete pipeline assembly

Design philosophy: Flink abstractions + Reactor implementation
Co-authored-by: 13585811473 <13585811473@163.com>
Co-authored-by: 13585811473 <13585811473@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants