-
Notifications
You must be signed in to change notification settings - Fork 16.3k
AIP-72: Handle Custom XCom Backend on Task SDK #47339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-72: Handle Custom XCom Backend on Task SDK #47339
Conversation
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I love the place this is implemented, I wonder if it's time for an execution_time/xcom.py file
|
Did a final round of regression tests with a few example dags both with Xcom DB backend and custom XCom backend (described in PR desc) DAG1: DAG2: DAG3: DAG4: These works as expected, both with the custom xcom backend as well as with the XCom db backend. |
|
Huh, finally got a green run here. I am going to merge this since it has already gone too huge, will take up any issues in follow ups. |
|
Thanks @ashb and @pierrejeambrun for the review. |
|
You mentioned:
There is also situation 3 that mix both using Are we covered in this case as well? |
|
@eladkal yep that case is covered as well as that logic to handle threshold is handled within the custom xcom backend. No part of that has been tweaked at all |
closes: #45481
What?
XCom backends allow us to customise and store the xcoms to different backends, it is of use because:
The default XCom backend is the BaseXCom class, which stores XComs in the Airflow database. This is fine for small values, but can be problematic for large values, or for large numbers of XComs.
Docs: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/xcoms.html#custom-xcom-backends
Important changes
Models XCOM
XComModelout ofBaseXComXComModelwill only have table related stuff and some utilities like:get_many,clear,setstays as its hard to remove for now.BaseXComis a class that will be moved to the task sdk underairflow.sdk.execution_time-- thought a bit, its not a definition, so maybedefinitionsisnt the right place for it. Although we can softly argue on this oneExecution Time XCOMs
a)
set- store an xcom value, for BaseXcom store thevaluein the DB and for xcom backends, store thepathin the DB. This part handles it:b)
get_value- retrieve a xcom value, uses the new task SDK.c)
get_one- similar to what we had earlier in models. Uses the task sdk to get an xcom value, either from DB or from the custom xcom backend if configured. That part is handled here:d)
serialize_value- for normal cases, we serialise the value we get from the DB for case of tables, and this is overriden by XCOM backends, so needn't have to worry about that.e)
deserialize_value- similar to the above case.f)
purge- used to purge an xcom entry, mainly for custom xcom backends. Empty for ORMg)
delete- used to delete an xcom. Added a new utility, will explain below why this was added.resolve_xcom_backendis used to resolve a custom XCom class if provided in the conf or just return theBaseXcomclass, exactly ditto as earlier behaviour.Execution Time changes
DeleteXComintroducedDeleteXComwhen calledxcom_pullandxcom_pushnow use the utilities inexecution_time/xcom.pyExecution API server changes
Task Instances
ti_runendpoint now sendsxcom_keys_to_clearwhich is basically:Xcoms
get_xcom,set_xcomnow directly query the DB instead of using thexcom.setorxcom.get_manyutilities.delete xcomswhich will be called by the task runner.Core api changes
BaseXcommodel.Other changes
Testing
Situation 1: Using the database for xcoms (no xcom backend)
Using DAG:
Success:

Xcom pushed by task 1:
Task 1 logs:

Sends the
SetXcomcall:Xcom pushed by task 2:
Task 2 logs:
Sends the GetXcom call:
Status of the table:
Observe that the data has been stored in the table in a native python object manner (json complaint) and actual data is stored and not a reference of it.
Situation 2: Using a custom xcom backend.
Using DAG:
Setup
init.sh)Run the dag normally
Task 1
XCOm pushed from task 1:

Logs:
Logs showing what was actually pushed:
The SetXcom call:
Task 2:

Xcom pushed by task 2:
Important logs from task 2:
Inside the worker, check for the log json file:
Status of the table:
Observe that the path is stored.
TODO / Whats next:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.