-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Closed
Closed
Copy link
Labels
area:serializationarea:task-execution-interface-aip72AIP-72: Task Execution Interface (TEI) aka Task SDKAIP-72: Task Execution Interface (TEI) aka Task SDK
Milestone
Description
Currently the serialization and de-serialization logic lives in airflow/serialization in the Core. With Airflow 3 and the separation of Task SDK, we will need to make serialization and its versioning much stricter.
We should bump the current DAG serialization version to 2.
Approach
The serialization code should live closer to language-specific Task SDK as it knows best how to serialize objects in a language to a JSON-formatted string.
The Core/scheduler will contain the de-serialization code -- and it does need to be language specific as it contains only the info needed by the scheduler.
The contract between those two is the schema.json file that contains the serialization. Both the client and server could support multiple versions at a time.
Architecture
Task SDK (Serialization) Schema Contract Server (Deserialization)
┌─────────────────────┐ ┌─────────────────┐ ┌────────────────────────┐
│ Language-specific │────▶│ schema.json │◀──│ Language-agnostic │
│ DAG → JSON │ │ (versioned) │ │ JSON → SerializedDAG │
│ │ │ │ │ │
│ - Python SDK │ │ Version 2.0 │ │ - Scheduler │
│ - Go SDK │ │ Version 2.1 │ │ - API-Server │
│ - Future SDKs │ │ Version 2.2 │ │ │
└─────────────────────┘ └─────────────────┘ └────────────────────────┘
Alternative Options to Compare
Option 2: Shared Serialization in airflow-protocols
- Approach: Both serialization and deserialization live in airflow-protocols package
- Pros: Single source of truth, shared implementation, easier maintenance
- Cons: Both server and SDK depend on same package, potential coupling
- Package location: airflow-protocols
Option 3: Symmetric Implementation
- Approach: Both SDK and server can serialize/deserialize
- Pros: Flexibility, testing capabilities, debugging support
- Cons: Code duplication, potential drift between implementations
Key Questions to Answer
- Where should serialization live?
airflow-commons/airflow-protocolsvs separate packages vs both? - Should s10n and des10n be separate? Or symmetric implementation?
- How to handle versioning?
- Backward compatibility strategy?
Success Criteria
- SDK can serialize DAGs without importing server components and should be able to deserialize multiple versions
- Server can deserialize without importing SDK components
- Multiple schema versions supported simultaneously
- Existing serialized DAGs can be migrated
- Clear path for future language SDKs
Metadata
Metadata
Assignees
Labels
area:serializationarea:task-execution-interface-aip72AIP-72: Task Execution Interface (TEI) aka Task SDKAIP-72: Task Execution Interface (TEI) aka Task SDK
Type
Projects
Status
Done