Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom #27079

qcha41 · 2022-10-16T20:14:05Z

Description

In order to create an XCom value with a BashOperator or a DockerOperator, we can use the option do_xcom_push that pushes to XCom the last line of the command logs.

It would be interesting to provide an option xcom_json to deserialize this last log line in case it's a JSON string, before sending it as XCom. This would allow to access its attributes later in other tasks with the xcom_pull() method.

Use case/motivation

See my StackOverflow post : https://stackoverflow.com/questions/74083466/how-to-deserialize-xcom-strings-in-airflow

Consider a DAG containing two tasks: DAG: Task A >> Task B (BashOperators or DockerOperators). They need to communicate through XComs.

Task A outputs the informations through a one-line json in stdout, which can then be retrieve in the logs of Task A, and so in its return_value XCom key if xcom_push=True. For instance : {"key1":1,"key2":3}
Task B only needs the key2 information from Task A, so we need to deserialize the return_value XCom of Task A to extract only this value and pass it directly to Task B, using the jinja template {{xcom_pull('task_a')['key2']}}. Using it as this results in jinja2.exceptions.UndefinedError: 'str object' has no attribute 'key2' because return_value is just a string.

For example we can deserialize Airflow Variables in jinja templates (ex: {{ var.json.my_var.path }}). Globally I would like to do the same thing with XComs.

Current workaround:

We can create a custom Operator (inherited from BashOperator or DockerOperator) and augment the execute method:

execute the original execute method
intercepts the last log line of the task
tries to json.loads() it in a Python dictionnary
finally return the output (which is now a dictionnary, not a string)

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

class BashOperatorExtended(BashOperator):
    def execute(self, context):
        output = BashOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

class DockerOperatorExtended(DockerOperator):
    def execute(self, context):
        output = DockerOperator.execute(self, context)
        try: 
            output = json.loads(output)
        except:
            pass
        return output

But creating a new operator just for that purpose is not really satisfying..

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

uranusjr · 2022-10-17T08:34:19Z

But creating a new operator just for that purpose is not really satisfying..

People should be encouraged more to create new, ad-hoc operators, if you ask me. Classes are first-class in Python, and an Airflow DAG is Python, so it’s most productive to use Python. If we try to put every possible customisation in operators, the end result would not be different from, say, using a YAML file, and defeats to purpose to use Python in the first place.

qcha41 · 2022-10-17T17:24:21Z

I agree but here, we talk about a feature that already exists implicitely in other operators. For instance, with a PythonOperator, you can push a Python dictionnary, and directly use it as a dictionnary with the xcom_pull() method.

potiuk · 2022-10-24T15:07:57Z

The previous jinja template {{ xcom_pull('task_a')['key2'] }} is now working in task B, since the XCom value is now a Python dictionnary.

Actually I think that could be made into a common "AbstractOperator" feature when I think of it. We could add "deserialize_output" parameter so that any operator can use it. I think we should even deserialize it using yaml, because then we will automatically handle both Yaml, and JSON (Yamlk is actually a 100% compatible superset of JSON - every proper JSON content is also a valid YAML).

WDYT @uranusjr ? I think having it as common "operator" feature (disabled by default) is quite a powerful feature that can maje a number of existing operators much easier to work witth.

potiuk · 2022-10-31T02:56:59Z

@uranusjr ? Any thoughts ?

uranusjr · 2022-11-10T23:05:57Z

If the goal is to make Jinja2 templating simpler (there’s no issue if it’s taskflow), the simplest way may be to add a built-in macro for this?

{{ json_loads(xcom_pull('task_a'))['key2'] }}

qcha41 added the kind:feature Feature Requests label Oct 16, 2022

potiuk added the good first issue label Nov 16, 2022

RachitSharma2001 mentioned this issue Jan 13, 2023

Make json and yaml available in templates #28930

Merged

potiuk closed this as completed in #28930 Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom #27079

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom #27079

qcha41 commented Oct 16, 2022 •

edited

Loading

uranusjr commented Oct 17, 2022

qcha41 commented Oct 17, 2022

potiuk commented Oct 24, 2022 •

edited

Loading

potiuk commented Oct 31, 2022

uranusjr commented Nov 10, 2022

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom #27079

Option to deserialize JSON from last log line in BashOperator and DockerOperator before sending to XCom #27079

Comments

qcha41 commented Oct 16, 2022 • edited Loading

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

uranusjr commented Oct 17, 2022

qcha41 commented Oct 17, 2022

potiuk commented Oct 24, 2022 • edited Loading

potiuk commented Oct 31, 2022

uranusjr commented Nov 10, 2022

qcha41 commented Oct 16, 2022 •

edited

Loading

potiuk commented Oct 24, 2022 •

edited

Loading