-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Move Serialization/Deserialization (serde) to task SDK #58992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Should not serialzation be moved to |
Yes. There is a reason to do it actually. This is actually splitting serialization properly as it should rightfully be and it separates things rightly towards the future direction. So, serialization is in two parts (somewhat messy, but let me still explain):
So this loading of task-sdk will only happen when we try to deserialise an object relevant to |
airflow-core/src/airflow/api_fastapi/core_api/routes/public/xcom.py
Outdated
Show resolved
Hide resolved
Makes sense. I never liked the idea of having "one serialization to rule them all". Thanks for explanation. |
Also it makes it far better from the security standpoint. And opens the door of having serde implementation connected with providers installed (and likely discovered by Providers Manager) |
Absolutely me too! |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API side LGTM.
|
Thanks, I will merge it once I am back from holidays (16 Dec), just because the scope of this PR |
|
Alright, let's merge this one today |
|
Broken tests are unrelated. Same failure on main too: https://github.com/apache/airflow/actions/runs/20298266352/job/58300350219 |
|
Merging this PR |
Backport failed to create: v3-1-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 3af4d28 v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
|
No backport needed, this is targetted for 3.2.0 |
|
#protm |
closes: #58887
closes: #58885
Motivation
This change is part of the broader effort to achieve client-server separation. The serialization and deserialization utilities (
serde) are execution time utilities used primarily during task execution for:Since these utilities are only needed at execution time (in workers, DAG processors, and triggerers), they belong in the
task-sdkrather thanairflow-core. This allows the core server components (Scheduler, API Server) to operate without requiring the task SDK, and also enabled task SDK to not having to import serde from airflow core, enabling independent deployments and upgrades.Blockers and Solutions
Several blockers were identified and resolved:
Migration File: The migration
0092_3_2_0_replace_deadline_inline_callback_with_fkey.pyusesserde.deserialize()Effort tracked by @ramitkataria in Do not use serde library for database migrations
XCom API Stringification: The core API's xcom endpoint (
/api/v2/dags/{dag_id}/dagRuns/{run_id}/taskInstances/{task_id}/xcomEntries) useddeserialize(full=False)to stringify XCom values for UI display.The solution was to create a dedicated
stringify.pymodule inairflow-corethat provides stringification without any SDK dependencies. This module matches the behavior ofdeserialize(full=False)exactly but is self-contained and independent of any external libraries.XComEncoder/XComDecoder: These JSON encoder/decoder classes in
airflow-coredirectly importedserdefor full serialization/deserialization of XCom values.The fix here was to implement lazy imports that attempt to load
serdefrom the SDK, making the dependency optional. If the SDK is not available, appropriate error messages are shown. The consumers of this anyways were mostly fixed / handled by: Decouple xcom public API from using XcomEncoder #58900, which discovered that the same effect can be achieved without using those encoders. In a follow-up, I might move it to task SDK as well.High Level Changes
Serde module is moved to task sdk now
task-sdk/src/airflow/sdk/serialization/serializers/Stringification Separation
airflow-core/src/airflow/serialization/stringify.pyfor UI stringificationdeserialize(full=False)behavior exactlyDeprecation Path
from airflow.serialization.serde import) continue to workTests
task-sdk/tests/task_sdk/serialization/test_serde.pyairflow-core/tests/unit/serialization/test_stringify.pyDocs
What does this mean for new serializers? (Debatable, check open questions below)
Important: New serializers should now be added to the SDK namespace:
The old
airflow-core/src/airflow/serialization/serializers/directory still exists for backward compatibility but is deprecated. New serializers should not be added there.Open Questions
When should we delete
airflow-core/src/airflow/serialization/serializers/? Currently kept for backward compatibility, but should be removed in a future release, no code reads off of here, shall we just delete it? What for people who have added their serializers in here?Should XcomEncoder + Decoder stay in airflow core or task sdk?
Testing
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.