-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Upgrade Kryo version of Kryo Java extension #27635
Comments
Thanks @Gadiguibou ! |
Hi @aromanenko-dev! The migration itself seems relatively straightforward. |
While I'm here I could also add support for Java records to |
In Beam, we tend to avoid the breaking changes that may affect users after Beam upgrade, especially, if it may cause potentially data loss or corruption, as much as we can. Since Beam doesn't support "hot" upgrade on the fly while running a pipeline, and Kryo extension is used only as a So, this means that it should be safe. Regarding the adding support for Java records to SchemaCoder - I believe it deserves a dedicated issue/PR. Btw, I'm not sure that we have an active maintainer for this extension but I'll ping @dmvk, who initially worked on this and Euphoria extensions (that heavily uses |
Moved the issue about |
@aromanenko-dev Is this an official policy? Upgrading pipeline in general can be tricky, but I believe we should do our best to enable users to upgrade their pipelines, if possible. This change forces users to do a bootstrap of streaming pipelines which might be expensive and might discourage users from upgrading. I think we could have tried introducing KryoV5Coder and deprecate original KryoCoder for at least several releases, we might have even kept old coder until we drop support for JDK 11 (Kryo 4 does not support JDK 17). Also some more context on this: https://lists.apache.org/thread/5glkdp68b024t8pnwq7ncz5mt9gcr27r I came across this when validating 2.50.0 RC (this is even a code breaking change, beacuse the KryoCoder 'leaks' kryo API). Wondering if we should try to fix it before the release. |
I'd personally prefer to rollback this change for 2.50.0 and introduce KryoV5Coder in 2.51.0, but I don't think this needs to be a blocker for release. Could we do that in case we need to build another RC? |
@je-ik I'm sorry for delay with an answer, I was on vacation. I agree that we have to do our best to avoid breaking changes (I'm one of the person who always propose to deprecate things before) and I mentioned this before, but I don't remember any "official" (or previously agreed) policy about such a support of Beam upgrade on the fly, and particularly, for streaming pipelines. So, do we need to guarantee a Coder compatibility between Beam releases since Beam coders are supposed to be used only "inside" a pipeline?
Did you have a compilation time or runtime issue? Could you provide more details on this? Also, as a side note: |
I think this is part of a broader discussion. IMPOV, speaking about an 'ideal' solution, we should be able to provide way to upgrade any streaming pipeline, at least for the case when the application logic does not change. After all, the streaming pipeline is a long-running software component and thus upgrades are expected to happen. The truth is that Beam is not quite there, yet. And there are fundamental reasons why upgrading a pipeline after changing logic might be hard (if even possible), which means users must be able to restart their pipeline from scratch. In the case of this issue, we probably could have taken path that would not break existing pipelines (introducing different Coder, shading Kryo v5 into different package), so that we would minimize impact of the upgrade. But because the new coder has already been released I think there are no more actions needed to be taken here. I did not see this as enough important to block the release on this. Regarding the 'leakage' of the Kryo API, it is generates compile-time error, because there are some (minor) changes in the Kryo v5 API (e.g. Serializers). I'm not sure we can meaningfully make Kryo provided, because the coder is always bound to a specific (major) version of Kryo. |
@je-ik Thanks for details. I fully agree with your statement about streaming pipelines upgrade that in real world is a very complicated thing to do and, especially, to maintain in the future for all parts of Beam. I also agree that for this current issue we could introduce a new coder and deprecated an old one but, as I explained before, at that time I thought it was not so necessary to do. Regarding a custom user Kryp version - we had the similar issue with Avro (it was even worse since Avro was a citizen of "core") and we finally made it possible to use any supported Avro version for users. |
I meant, yes, we could make Kryo pluggable (I can imagine many ways of doing this), but the biggest pain always comes from incompatibility of the serialized data. The best option would be to include a version marker in the data and give KryoCoder the ability to work with multiple versions of Kryo at the same time. But that would require major reimplementation of the coder. :) |
What would you like to happen?
New Kryo versions support useful features like serialization of Java records. These features are currently unavailable to users of the Kryo Beam extension. Without this, users must fall back to other serialization methods or maintaining custom implementations for
equals()
hashCode()
and friends.Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: