Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unicodecsv with standard csv library #31693

Merged
merged 7 commits into from
Jun 7, 2023

Conversation

dstandish
Copy link
Contributor

unicodecsv appears to be missing a license which can cause trouble (see jdunck/python-unicodecsv#80)

And it appears that this library may no longer be required

I have not tested this with any of the actual services... just coding it up based on how it's supposed to work AFAICT

Basically, previously you needed to open the file in mode wb and then you set the encoding on the writer object. Then the writer object would accept strings and encode them before passing to the file handler.

And now, you open the file in text mode (optionally you can set an encoding) and then you still pass it a string.

There were some cases (sql to gcs) where the file handler was sometimes used directly (separte from the csv writer object) and there we previously needed to encode to utf-8 first -- that's why i removed those "encode" calls.

There were a couple instances where the handler previously opened in w mode and i'm not sure how that actually worked because it doesn't seem to work for me. Possible those cases were actually broken? (mssql-to-hive, vertica-to-hive, vertica-to-mysql) Also possible i am just missing something on that issue.

@boring-cyborg boring-cyborg bot added area:providers provider:Apache provider:microsoft-azure Azure-related issues provider:google Google (including GCP) related issues labels Jun 2, 2023
@dstandish dstandish requested a review from jedcunningham June 2, 2023 21:34
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, generally looks good to me

airflow/providers/google/cloud/transfers/sql_to_gcs.py Outdated Show resolved Hide resolved
@dstandish dstandish force-pushed the remove-unicode-csv branch 2 times, most recently from df51506 to bdf982b Compare June 7, 2023 08:19
dstandish and others added 7 commits June 7, 2023 09:09
unicodecsv appears to be missing a license which can cause trouble (see jdunck/python-unicodecsv#80)

And it appears that this library may no longer be required
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@dstandish dstandish force-pushed the remove-unicode-csv branch from bdf982b to 8a1ce95 Compare June 7, 2023 16:09
@jedcunningham jedcunningham merged commit fbeb01c into apache:main Jun 7, 2023
@jedcunningham jedcunningham deleted the remove-unicode-csv branch June 7, 2023 18:59
@eladkal eladkal added this to the Airflow 2.6.2 milestone Jun 8, 2023
@eladkal eladkal added the type:misc/internal Changelog: Misc changes that should appear in change log label Jun 8, 2023
eladkal pushed a commit that referenced this pull request Jun 8, 2023
unicodecsv appears to be missing a license which can cause trouble (see jdunck/python-unicodecsv#80)

And it appears that this library may no longer be required.

(cherry picked from commit fbeb01c)
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 9, 2023
Removing unicodescv as dependency invites problems when users will
use older hive, google, microsoft providers, because they were
using unicodecsv, but they did not declare it as dependency (it
was a transitive dependency of the "apache-airflow" package).

It has been removed in apache#31693

Unicodecsv has very low footprint so this is not a problem to
keep it.

The dependency misses license in it's package, therefore we
add the licence in our "licences" folder.
potiuk added a commit that referenced this pull request Jun 9, 2023
Removing unicodescv as dependency invites problems when users will
use older hive, google, microsoft providers, because they were
using unicodecsv, but they did not declare it as dependency (it
was a transitive dependency of the "apache-airflow" package).

It has been removed in #31693

Unicodecsv has very low footprint so this is not a problem to
keep it.

The dependency misses license in it's package, therefore we
add the licence in our "licences" folder.
potiuk pushed a commit that referenced this pull request Jun 9, 2023
unicodecsv appears to be missing a license which can cause trouble (see jdunck/python-unicodecsv#80)

And it appears that this library may no longer be required.

(cherry picked from commit fbeb01c)
potiuk added a commit that referenced this pull request Jun 9, 2023
Removing unicodescv as dependency invites problems when users will
use older hive, google, microsoft providers, because they were
using unicodecsv, but they did not declare it as dependency (it
was a transitive dependency of the "apache-airflow" package).

It has been removed in #31693

Unicodecsv has very low footprint so this is not a problem to
keep it.

The dependency misses license in it's package, therefore we
add the licence in our "licences" folder.

(cherry picked from commit a853233)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants