-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Airflow serialization for namedtuple #37168
Fix Airflow serialization for namedtuple #37168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm picky in this area due for speed and tightness. This gets called a lot. Make sure to check for attributes as much as you can (hasattr > isinstance)
ping @Joffreybvn :-) |
Sorry, busy week. I'm on it ! |
7e3f622
to
3a96aec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the changes to a minimum. The only two lines that are required are
if _is_namedtuple(o):
qn = "builtins.tuple"
We do need to specify the classname, which should be "builtins.tuple": The If we hardcode it to "builtins.tuple", it gets deserialized correctly. |
Hey @joffreyvbn @bolkedebruin - any chance to get it soon ? |
CI-CD is failing because the generated classname is the name of the namedtuple, instead of "builtins.tuple". Otherwise, how do you want namedtuples to be serialized? |
Ah sorry @Joffreybvn , I missed that. If you'd like you can revert to the previous / earlier version. That looks and reads better. Add a comment that you are overriding the class name and it makes things clear |
namedtuple is serialized like 'builtins.tuple'
In the future, if another object very similar to tuple exists, we can also check for `_make` and `_replace` attributes.
The classname of the namedtuple has to be `"builtins.tuple"`, instead of the qualname (the qualname will refer to the name given to the namedtuple when created). I assume that `_serializers[qn].serialize(o)` won't always return a `classname` == qualname. Thus, creating a new `classname` variable set to "builtins.tuple", to ensure the namedtuple is considered as tuple.
afccd95
to
e7be465
Compare
This reverts commit e7be465.
This reverts commit 9015e3a.
Note: we do need to specify the classname, which should be "builtins.tuple": The serialize method (in airflow/serialization/serializers/builtin.py) does qualname() on the namedtuple, which returns "airflow.providers.databricks.hooks.Row" (the namedtuple dynamically created in the hook). If this is used as classname, it will fail to deserialize: there won't be any deserializer for it. If we hardcode it to "builtins.tuple", it gets deserialized correctly.
Thanks @bolkedebruin ! I reverted the commit, and added a comment to clarify it. |
Namedtuple is serialized like 'builtins.tuple' The serialize method (in airflow/serialization/serializers/builtin.py) does qualname() on the namedtuple, which returns an arbitrary name. If this is used as classname, it will fail to deserialize: there won't be any deserializer for it. (cherry picked from commit c49f857)
Namedtuple is serialized like 'builtins.tuple' The serialize method (in airflow/serialization/serializers/builtin.py) does qualname() on the namedtuple, which returns an arbitrary name. If this is used as classname, it will fail to deserialize: there won't be any deserializer for it. (cherry picked from commit c49f857)
This PR fixes: #36839
In some cases, the namedtuple was not recognized as tuple and failed to serialize.
Side note:
When we return a tuple from a task, the tuple reach the XComEncoder and get serialized as a dict - thanks to this line. However, when we return a list of tuples, the tuples get serialized as lists. The above condition is false, and we fallback to the default json encoding behavior. Thus, the way namedtuple / tuple are handled is inconsistent.
This causes some queries of the Databricks Hook and Operator to fail right now, depending on how the Hook / Operator are parameterized.
This PR fix the namedtuple for such queries, but it doesn't fix the inconsistency: sometimes namedtuple will be encoded as dict, sometimes as list. IMO a definitive fix involve another strategy than
qualname()
. Because namedtuples have dynamic qualnames, wich will never be "builtins.tuple".^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.