You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Block Builder and Block Scheduler are separate components designed to build storage formats from ingested Kafka data. The Block Scheduler coordinates job distribution to multiple Block Builder instances, implementing a pull-based architecture that decouples read and write operations, allowing for independent scaling and simpler operational management. This document describes the architecture and interaction between components.
6
+
7
+
## Package Structure
8
+
9
+
The Block Builder system is organized into three main packages:
10
+
11
+
### pkg/blockbuilder/types
12
+
- Contains shared type definitions and interfaces
13
+
- Defines core data structures like `Job` and `Offsets`
14
+
- Provides interface definitions for:
15
+
-`Worker`: Interface for processing jobs and reporting status
16
+
-`Scheduler`: Interface for job scheduling and worker management
17
+
-`Transport`: Interface for communication between components
18
+
19
+
### pkg/blockbuilder/scheduler
20
+
- Implements the job queue and scheduling logic
21
+
- Manages job distribution to block builders
22
+
- Tracks job progress and ensures exactly-once processing
23
+
- Handles job state management and offset tracking
24
+
25
+
### pkg/blockbuilder/builder
26
+
- Implements the block builder worker functionality
27
+
- Processes assigned jobs and builds storage formats
28
+
- Manages transport layer communication
29
+
- Handles data processing and object storage interactions
30
+
31
+
## Component Diagram
32
+
33
+
```mermaid
34
+
graph TB
35
+
subgraph Kafka
36
+
KP[Kafka Partitions]
37
+
end
38
+
39
+
subgraph Block Scheduler
40
+
S[Scheduler]
41
+
Q[Job Queue]
42
+
PC[Partition Controller]
43
+
44
+
subgraph Transport Layer
45
+
T[gRPC/Transport Interface]
46
+
end
47
+
end
48
+
49
+
subgraph Block Builders
50
+
BB1[Block Builder 1]
51
+
BB2[Block Builder 2]
52
+
BB3[Block Builder N]
53
+
end
54
+
55
+
subgraph Storage
56
+
OS[Object Storage]
57
+
end
58
+
59
+
KP --> PC
60
+
PC --> S
61
+
S <--> Q
62
+
S <--> T
63
+
T <--> BB1
64
+
T <--> BB2
65
+
T <--> BB3
66
+
BB1 --> OS
67
+
BB2 --> OS
68
+
BB3 --> OS
69
+
```
70
+
71
+
## Job Processing Sequence
72
+
73
+
```mermaid
74
+
sequenceDiagram
75
+
participant PC as Partition Controller
76
+
participant S as Block Scheduler
77
+
participant Q as Queue
78
+
participant T as Transport
79
+
participant BB as Block Builder
80
+
participant OS as Object Storage
81
+
82
+
loop Monitor Partitions
83
+
PC->>PC: Check for new offsets
84
+
PC->>S: Create Job (partition, offset range)
85
+
S->>Q: Enqueue Job
86
+
end
87
+
88
+
BB->>T: Request Job
89
+
T->>S: Forward Request
90
+
S->>Q: Dequeue Job
91
+
Q-->>S: Return Job (or empty)
92
+
alt Has Job
93
+
S->>T: Send Job
94
+
T->>BB: Forward Job
95
+
BB->>OS: Process & Write Data
96
+
BB->>T: Report Success
97
+
T->>S: Forward Status
98
+
S->>PC: Commit Offset
99
+
else No Job
100
+
S->>T: Send No Job Available
101
+
T->>BB: Forward Response
102
+
end
103
+
```
104
+
105
+
## Core Components
106
+
107
+
### Job and Offsets
108
+
-`Job`: Represents a unit of work for processing Kafka data
109
+
- Contains a partition ID and an offset range
110
+
- Immutable data structure that can be safely passed between components
111
+
-`Offsets`: Defines a half-open range [min,max) of Kafka offsets to process
112
+
- Used to track progress and ensure exactly-once processing
113
+
114
+
### Block Scheduler
115
+
- Central component responsible for:
116
+
- Managing the job queue
117
+
- Coordinating Block Builder assignments
118
+
- Tracking job progress
119
+
- Implements a pull-based model where Block Builders request jobs
120
+
- Decoupled from specific transport mechanisms through the Transport interface
121
+
122
+
### Block Builder
123
+
- Processes jobs assigned by the Block Scheduler
124
+
- Responsible for:
125
+
- Building storage formats from Kafka data
126
+
- Writing completed blocks to object storage
127
+
- Reporting job status back to scheduler
128
+
- Implements the Worker interface for job processing
129
+
130
+
### Transport Layer
131
+
- Provides communication between Block Builders and Scheduler
132
+
- Abstracts transport mechanism (currently in-memory & gRPC)
133
+
- Defines message types for:
134
+
- Job requests
135
+
- Job completion notifications
136
+
- Job synchronization
137
+
138
+
## Design Principles
139
+
140
+
### Decoupled I/O
141
+
- Business logic is separated from I/O operations
142
+
- Transport interface allows for different communication mechanisms
143
+
- Enables easier testing through mock implementations
0 commit comments