-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dill package to use cloudpickle #38531
Conversation
This is cool :). But: Certainy we will need to keep backwards-compatibility option (and I do not think dill should be I think this change needs a bit more:
This way we will:
|
Thank you @potiuk and @hussein-awala for your comments! |
ae1f8be
to
48ce74f
Compare
It shoudl not . @bolkedebruin -> i believe serde should be good for it and we should be able to do round-trip serialization of most types |
Hi @bolkedebruin @potiuk ! |
I think if you have no confirmation from @bolkedebruin on the proposed path, the way to go is to implement POC and see if it works with the current executor configs @VladaZakharova. There is no better way to confirm the approach. |
@potiuk If we are talking about migration and changing the way we serialize, should we consider changing dill to use json? Are there some limitations here? By doing this, we can try to avoid the same problem in the future for other Python versions. WDYT? Also regarding the original issue with incorrect serialization for Python 3.11, is it a problem only with serialization or deserialization too? If we can use dill for Python 3.11 only for deserialization, and cloudpickle for serialization, is there will be a problem? @hussein-awala @Taragolis Can you give us some details here? Thanks! |
As mentioned before - we need to be able to handle different serializers - to handle the K8S configuration problem described above in #38531 (comment). This is the reason we have dill in the first place. If serde will not solve the problem (seems not) - then the solution with storing pickler together with serialized value seems to be good direction - providing that migration scenarios will be part of the solution of course. |
The Python 3.11 issue is really a "test" issue - I am not really sure if this has the effect in production. BTW. I think there is a little misunderstanding here. If the only reason for this change is supporting cloudpickle in Python Virtualenv Operato and not getting rid of dill, then we can likely leave Maybe I was assuming too much of a reason for that change when I was commending on dill used in the executor. And we can leave that part altogether - concentrating back on just Python Virtualenv Operator. |
6484d25
to
35afc9c
Compare
f7c1d8b
to
f22d1dd
Compare
@potiuk , |
f102d7b
to
54eee08
Compare
Yes, sorry for the confusion - I think indeed we should limit that change to only that - all the complexity that replacing core executor config might still remain. It might not help with some dependency issues (I.e. dill will still be a core depenency) - but it will give the users a way to handle their pickling for PVO / External Python Operator better. |
8361754
to
05e5529
Compare
e270a41
to
826f175
Compare
826f175
to
e2eabbb
Compare
@potiuk , hi |
yes. Python Operator and related are part of the Airflow core. |
Thank you! |
Related issue: #35307
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.