Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUID encoded in CassandraToGCSOperator but not other operators #22846

Closed
2 tasks done
fuxiao224 opened this issue Apr 7, 2022 · 9 comments
Closed
2 tasks done

UUID encoded in CassandraToGCSOperator but not other operators #22846

fuxiao224 opened this issue Apr 7, 2022 · 9 comments
Assignees
Labels

Comments

@fuxiao224
Copy link
Contributor

Apache Airflow version

2.2.5 (latest released)

What happened

I noticed that UUID is encoded in CassandraToGCSOperator by:

elif isinstance(value, UUID): return b64encode(value.bytes).decode('ascii')

Therefore, for example, UUID 000e0000-5719-12a3-0000-000028327d4a is represented as "AA4AAFcZEqMAAAAAKDJ9Sg==". However, this seems inconsistent with other *TOGCSOperators. For example, UUID in MySQL/oracle is represented as a UTF8 string of five hexadecimal numbers in format "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", so in the previous example, UUID 000e0000-5719-12a3-0000-000028327d4a would still be represented as "UUID 000e0000-5719-12a3-0000-000028327d4a". Therefore, when using MySQLToGCSOperator/OracleToGCSOperator, UUID will preserve as "UUID 000e0000-5719-12a3-0000-000028327d4a" format, which is not encoded.

Thus, I wonder what is the main concern of encoding UUID in CassandraToGCSOperator, and if possible, can we change it to not encoding UUID when loading Cassandra table to GCS using Airflow? Please let me know your thoughts about this issue. Thanks!

What you think should happen instead

No response

How to reproduce

No response

Operating System

macOS

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@fuxiao224 fuxiao224 added area:core kind:bug This is a clearly a bug labels Apr 7, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 7, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

uranusjr commented Apr 9, 2022

According to #3483, this was done because

issue with UUID type conversion: currently UUID is converted to hex string, but should be converted to base64-encoded as that is the required format in BigQuery for uploading.

If this description is taken at face value, the format you proposed would not work? I am not familiar with Google services to provide more information, unfortunately.

@fuxiao224
Copy link
Contributor Author

Thanks! @uranusjr
This PR was opened 4 years ago, I'm not sure if this was a BigQuery rule at that time, but I don't think base64-encoded UUID format is a requirement for BigQuery at this point.

@fuxiao224
Copy link
Contributor Author

I wonder if there's any other concerns if I revert #3483 CassandraToGCSOperator part to convert UUID to hex string, instead of base64-encoded format? Thanks!

@uranusjr
Copy link
Member

You can’t simply revert it since that would introduce a backward incompatibility and break existing usages. Perhaps it’s possible to add a flag on the operator to toggle the format used.

@uranusjr
Copy link
Member

cc @jgao54 in case you know some more details on this.

@potiuk
Copy link
Member

potiuk commented Apr 13, 2022

Yeah. Adding parameter willl be a good solution.

@fuxiao224
Copy link
Contributor Author

Thanks for your suggestions! I agree that adding a parameter to let user choose from whether to encode UUID sounds a good plan. I'll create the PR asap.

@eladkal
Copy link
Contributor

eladkal commented May 27, 2022

Fixed in #23766

@eladkal eladkal closed this as completed May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants