Skip to content

Serialization returns unserialized numpy value #54097

@astro-anand

Description

@astro-anand

Apache Airflow version

3.0.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

numpy float64 subclasses float, so the serialize function returns the object without passing it to the custom numpy serializer. This means that when the numpy value is serialized to the xcom backend, it will get coerced to float and read back in as a float in subsequent tasks.

What you think should happen instead?

Run the built-in serializer code first, then check if the object is an instance of the primitive types. This ensure that a custom serializer would be called before returning the object

How to reproduce

import random
from airflow.sdk import dag, task, Asset, AssetAlias

ALIAS = AssetAlias("s3://alias-example")

TEST_ASSET = Asset("s3://my-bucket/data/file.csv")


@dag(tags=["asset_alias_demo"])
def asset_alias_demo():

    @task
    def get_file_ids():
        import numpy as np
        import pandas as pd

        return [np.float64(142.13412), np.float64(123.12351242), pd.DataFrame({"a": [1, 2, 3]})]

    @task(outlets=[TEST_ASSET])
    def add_assets_to_alias(i, outlet_events):
        """
        adds assets to alias
        """

        import numpy as np

        random_value = random.randint(1, 100)
        print(random_value, "is the random value for file_id:", i)
        print(type(i), "is the type of i")
        print("it is", isinstance(i, np.float64), "that i is a numpy float64")
        outlet_events[TEST_ASSET].extra = {
            "file_id": i,
            "random_value": random_value,
        }
        print("the outlet events are", outlet_events)

running this DAG will show you that the deserialized value of i is not a numpy float64

Operating System

MacOS

Versions of Apache Airflow Providers

N/A

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

area:corekind:bugThis is a clearly a bugpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions