-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Decouple Serialization and Deserialization Code for tasks #54569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kaxil
commented
Aug 20, 2025
kaxil
commented
Aug 20, 2025
8baf823 to
0334138
Compare
8248e4a to
c126079
Compare
89a6807 to
c0be635
Compare
kaxil
commented
Aug 27, 2025
6351823 to
3674170
Compare
7850b64 to
93c2642
Compare
93c2642 to
3bfc590
Compare
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 3, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 3, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 4, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 4, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 5, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 5, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 5, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 5, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 7, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 7, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 8, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 8, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 9, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 9, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 10, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 10, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 11, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 11, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 12, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 12, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 14, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 14, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 15, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 15, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 17, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 17, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 19, 2025
Remove Task SDK dependencies from airflow-core deserialization by establishing a schema-based contract between client and server components. This change enables independent deployment and upgrades while laying the foundation for multi-language SDK support. Key Decoupling Achievements: - Replace dynamic get_serialized_fields() calls with hardcoded class methods - Add schema-driven default resolution with get_operator_defaults_from_schema() - Remove OPERATOR_DEFAULTS import dependency from airflow-core - Implement SerializedBaseOperator class attributes for all operator defaults - Update _is_excluded() logic to use schema defaults for efficient serialization Serialization Optimizations: - Unified partial_kwargs optimization supporting both encoded/non-encoded formats - Intelligent default exclusion reducing storage redundancy - MappedOperator.operator_class memory optimization (~90-95% reduction) - Comprehensive client_defaults system with hierarchical resolution Compatibility & Performance: - Significant size reduction for typical DAGs with mapped operators - Minimal overhead for client_defaults section (excellent efficiency) - All existing serialized DAGs continue to work unchanged Technical Implementation: - Add generate_client_defaults() with LRU caching for optimal performance - Implement _deserialize_partial_kwargs() supporting dual formats - Centralized field deserialization eliminating code duplication - Consolidated preprocessing logic in _preprocess_encoded_operator() - Callback field preprocessing for backward compatibility Testing & Validation: - Added TestMappedOperatorSerializationAndClientDefaults with 9 comprehensive tests - Parameterized tests for multiple serialization formats - End-to-end validation of serialization/deserialization workflows - Backward compatibility validation for callback field migration This decoupling enables independent deployment/upgrades and provides the foundation for multi-language SDK ecosystem alongside the Task Execution API. Part of apache#45428
abdulrahman305 bot
pushed a commit
to abdulrahman305/airflow
that referenced
this pull request
Oct 19, 2025
…#55849) This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache#54569
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Oct 24, 2025
This change reduces serialized DAG size by automatically excluding fields that match their schema default values, similar to how operator serialization works. Fields like `catchup=False`, `max_active_runs=16`, and `fail_fast=False` are no longer stored when they have default values. Follow-up of apache/airflow#54569 (cherry picked from commit a582464766d984f34b07d5ac848de2057b43d0ae) GitOrigin-RevId: 67468ef2c5bb40ace29449778b000d71d19e461a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:serialization
area:task-sdk
full tests needed
We need to run full set of tests for this PR to merge
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🎯 Problem Statement
The Task SDK separation in Airflow 3.1 requires decoupling serialization and deserialization code to eliminate server-side dependencies on client SDK implementations:
airflow-coredeserialization currently depends on Task SDK'sBaseOperatorfor default values and field lists🚀 Solution Overview
This PR decouples (to a great extent) serialization and deserialization code by removing Task SDK dependencies from
airflow-core:get_serialized_fields()calls with hardcoded class methodsOPERATOR_DEFAULTSand other Task SDK imports from server-side codeschema.jsonandclient_defaultsinstead of Task SDK classes for default resolution📊 Benchmark
As part of this change, I optimised how the defaults are stored and when a field is stored and removed anything that matches defaults, nulls and the bigger impact change to remove storing entire callback functions as strings and instead store a boolean to indicate if a callback was set or not.
The bigger the DAG (more tasks + especially with callbacks), the more savings.
Using actual pre-optimization code:
🔥 Callback Optimization Analysis (100 tasks with 3 callbacks each):
🎯 Key Optimization Impact:
🏗️ Architecture Changes
Task Default Resolution
Implements hierarchical defaults during deserialization:
schema.json) - lowest priorityclient_defaults.tasks- SDK-specific overridespartial_kwargs- MappedOperator valuesSerialization Exclusion
Fields matching
client_defaultsare automatically excluded from task serialization, reducing redundancy while maintaining full information.fyi: Following the Task Execution API pattern, I aim to add versioned schema contract at Airflow website directly or version docs soon'ish:
Thinking about a URL like:
https://airflow.apache.org/schemas/dag-serialization/v2.json🚦 Migration Path
For Users
Appendix (for my own tracking)
TODOs (some might be done in a future PR):
schema.jsonon_*_callbackon tasks to usehas_on_*_callbackunmapmethod from scheduler-side #54816client_defaultsgeneration in serialization (Task SDK side)Future Work:
schema.jsonin the calver OpenAPI spec for Execution API and/or in airflow versioned docsui_color&ui_fgcolorOther points
ExtendedJSON- TypeDecorator used in serialization of the following:DagRun.context_carrierTaskInstance.next_kwargsBenchmark script: