Design a simplified pipeline framework #5

312268112 · 2025-11-07T06:23:18Z

Introduces a comprehensive design document for a single-instance pipeline framework.

This document details the architecture, core modules, and a callback-based, asynchronous backpressure mechanism across three distinct execution modes: STREAMING, BATCH_ROLLER (paginated batch), and SQL_TASK (direct SQL execution). It addresses the requirement for Jobs to be the minimal execution unit within a single instance and defines SQL tasks as a standalone execution flow, not a source-flow-sink pipeline.

Co-authored-by: 13585811473 <13585811473@163.com>

cursor · 2025-11-07T06:23:20Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Co-authored-by: 13585811473 <13585811473@163.com>

- Replace ASCII art diagrams with Mermaid syntax for better compatibility - Convert architecture layering diagram to Mermaid graph - Convert StreamGraph and JobGraph to Mermaid flowcharts - Convert backpressure execution flow to Mermaid flowchart - Keep sequence diagrams in Mermaid format (already converted) - Use monochrome color scheme (black and white) for all diagrams - Ensure diagrams render correctly on GitHub, GitLab, and Wiki platforms

Major Changes: - Adopt Spring Project Reactor as the core reactive framework - Native backpressure support through Reactor - Simplify all diagrams for better clarity - Add database schema design (job definition, execution, checkpoint) - Add Job state machine diagram - Add detailed module descriptions - Restructure document with more focused sections - Remove complex sequence diagrams - Add practical code examples with Reactor Technical Stack: - Spring Boot - Project Reactor (reactive streams + backpressure) - Spring Data JPA - RocksDB (state backend) - Apache Calcite (optional, for SQL) Key Features: - Reactive programming model - Built-in backpressure control - Unified API for streaming and batch - Simple and professional diagrams - Clear module responsibilities

Major Improvements: 1. Add comprehensive module documentation - Each module has clear responsibilities - Detailed core interfaces and implementation points - Code examples for each module 2. Add complete Checkpoint mechanism design - Checkpoint architecture and workflow - Three execution mode checkpoint strategies - Recovery process and cleanup policies - Detailed sequence diagrams 3. Add complete database schema design - 5 core tables with detailed field descriptions - Sample JSON data structures - ER diagram showing relationships - Index design for performance 4. Add developer guide - How to develop custom Source/Sink connectors - How to add custom operators - Local development environment setup - Step-by-step examples 5. Improved diagrams - Simpler and more focused diagrams - Clear module boundaries - Professional state machines - Deployment architecture 6. Better document structure - Table of contents for easy navigation - Quick start section for new developers - Clear core concepts explanation - Deployment and operations guide The document is now ready for new developers to understand and start coding immediately.

Major Changes: 1. Add complete module overview section - List all 10 core modules upfront - Each module includes: function, significance, and challenges - Detailed explanation of key difficulties 2. Remove all emoji symbols - Clean professional documentation - Use text descriptions only 3. Improve diagram design - More complex and information-rich diagrams - Better use of canvas space - More branching, less linear chains - Diagrams only for complex concepts 4. Enhanced diagrams include: - Trigger mechanism with multiple branches - Complete state machine with all transitions - Backpressure strategy comparison - Checkpoint consistency coordination - Full execution flow with error handling 5. Better structure - Core modules overview at beginning - Detailed design for critical modules - Complete database schema with comments - Practical development guide Document size: 843 KB Modules covered: 10 core modules Diagrams: 12 focused diagrams Code examples: 15+ practical examples

Major Changes: 1. Redesign core concepts following Flink patterns - Source: Data source abstraction with checkpoint support - Operator: Stream operator abstraction (stateful/stateless) - Sink: Data sink abstraction with 2PC support - Job: Complete data pipeline definition - StreamGraph: Logical execution graph - JobGraph: Physical execution graph with operator chaining - State: State management with RocksDB backend - Checkpoint: Barrier-based checkpoint mechanism 2. Each concept includes: - Definition and purpose - Design challenges and solutions - Complete interface design - Implementation examples - Reactor integration approach 3. Core features: - Three source patterns: Streaming/Roller/SQL - Operator types: Stateless/KeyedState/OperatorState - Operator chain optimization - Two-phase commit sink for Exactly-Once - Barrier-based checkpoint coordination - RocksDB state backend 4. Reactor integration: - Source to Flux conversion - Operator to Flux transformation - Sink to Mono subscription - Complete pipeline assembly Design philosophy: Flink abstractions + Reactor implementation

Co-authored-by: 13585811473 <13585811473@163.com>

feat: Add Pipeline Framework design document

5ba73cc

Co-authored-by: 13585811473 <13585811473@163.com>

cursoragent and others added 8 commits November 7, 2025 07:24

Checkpoint before follow-up message

39f0761

Co-authored-by: 13585811473 <13585811473@163.com>

Refactor: Update pipeline framework design document

5e84d4b

Co-authored-by: 13585811473 <13585811473@163.com>

Refactor: Update pipeline framework design document

66f9cac

Co-authored-by: 13585811473 <13585811473@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design a simplified pipeline framework #5

Design a simplified pipeline framework #5

Uh oh!

312268112 commented Nov 7, 2025

Uh oh!

cursor bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Design a simplified pipeline framework #5

Are you sure you want to change the base?

Design a simplified pipeline framework #5

Uh oh!

Conversation

312268112 commented Nov 7, 2025

Uh oh!

cursor bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants