-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UUID encoded in CassandraToGCSOperator but not other operators #22846
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
According to #3483, this was done because
If this description is taken at face value, the format you proposed would not work? I am not familiar with Google services to provide more information, unfortunately. |
Thanks! @uranusjr |
I wonder if there's any other concerns if I revert #3483 CassandraToGCSOperator part to convert UUID to hex string, instead of base64-encoded format? Thanks! |
You can’t simply revert it since that would introduce a backward incompatibility and break existing usages. Perhaps it’s possible to add a flag on the operator to toggle the format used. |
cc @jgao54 in case you know some more details on this. |
Yeah. Adding parameter willl be a good solution. |
Thanks for your suggestions! I agree that adding a parameter to let user choose from whether to encode UUID sounds a good plan. I'll create the PR asap. |
Fixed in #23766 |
Apache Airflow version
2.2.5 (latest released)
What happened
I noticed that UUID is encoded in CassandraToGCSOperator by:
elif isinstance(value, UUID): return b64encode(value.bytes).decode('ascii')
Therefore, for example, UUID 000e0000-5719-12a3-0000-000028327d4a is represented as "AA4AAFcZEqMAAAAAKDJ9Sg==". However, this seems inconsistent with other *TOGCSOperators. For example, UUID in MySQL/oracle is represented as a UTF8 string of five hexadecimal numbers in format "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", so in the previous example, UUID 000e0000-5719-12a3-0000-000028327d4a would still be represented as "UUID 000e0000-5719-12a3-0000-000028327d4a". Therefore, when using MySQLToGCSOperator/OracleToGCSOperator, UUID will preserve as "UUID 000e0000-5719-12a3-0000-000028327d4a" format, which is not encoded.
Thus, I wonder what is the main concern of encoding UUID in CassandraToGCSOperator, and if possible, can we change it to not encoding UUID when loading Cassandra table to GCS using Airflow? Please let me know your thoughts about this issue. Thanks!
What you think should happen instead
No response
How to reproduce
No response
Operating System
macOS
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: