Support writing CSV to GCS #22704

bnaul · 2018-09-14T00:08:30Z

fixes Allow writing to GCS paths #23094
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This seems to work as-is and doesn't break any of the IO tests; as I mentioned in #8508 (comment) getting S3 to work is a little more complicated but maybe still not bad. But this would be a step in the right direction regardless.

cc @TomAugspurger

pep8speaks · 2018-09-14T00:08:33Z

Hello @bnaul! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/io/formats/csvs.py !
There are no PEP8 issues in the file pandas/tests/io/test_gcs.py !

WillAyd · 2018-09-15T18:48:45Z

pandas/tests/io/test_gcs.py

+def test_to_csv_gcs(mock):
+    df1 = DataFrame({'int': [1, 3], 'float': [2.0, np.nan], 'str': ['t', 's'],
+                     'dt': date_range('2018-06-18', periods=2)})
+    with mock.patch('gcsfs.GCSFileSystem') as MockFileSystem:


Might be missing the point but if you are patching this what is actually getting tested for gcs?

Yeah, it's kind of the same problem that we discussed in #20729. This does at least test the logic that I touched here; I think ultimately what the mocks assume is that gcsfs.GCSFileSystem can read/write strings and everything else is using the real pandas methods.

WillAyd

Thanks for the clarification on the mock

WillAyd · 2018-09-19T19:52:56Z

pandas/tests/io/test_gcs.py

+        instance = MockFileSystem.return_value
+        instance.open.return_value = s
+
+        df1.to_csv('gs://test/test.csv', index=True)


Any particular reason you are explicitly stating index=True here instead of using the default?

df.to_csv(f) and pd.read_csv(f) handle the index differently so I wanted to be extra clear that the index is also being checked in the round tripping

WillAyd · 2018-09-19T19:53:50Z

pandas/tests/io/test_gcs.py

+        instance.open.return_value = s
+
+        df1.to_csv('gs://test/test.csv', index=True)
+        df2 = read_csv(StringIO(s.getvalue()), parse_dates=['dt'], index_col=0)


Related to above comment

WillAyd · 2018-09-19T19:55:30Z

Could also use a related issue and whatsnew note for v0.24

TomAugspurger · 2018-09-19T20:57:24Z

When reading S3, we have to wrap it in a TextIOWrapper. Do we need to do the same for writing?

codecov · 2018-10-11T16:53:49Z

Codecov Report

Merging #22704 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #22704   +/-   ##
=======================================
  Coverage    92.2%    92.2%           
=======================================
  Files         169      169           
  Lines       50924    50924           
=======================================
  Hits        46952    46952           
  Misses       3972     3972

Flag	Coverage Δ
#multiple	`90.62% <100%> (ø)`	⬆️
#single	`42.3% <100%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/csvs.py	`98.21% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8ce3d0...b2f97cc. Read the comment docs.

bnaul · 2018-10-11T16:56:53Z

Hi @WillAyd @TomAugspurger,

Just added a whatsnew/related issue
The TextIOWrapper thing would indeed be needed for s3fs for but gcsfs it isn't; I'd like to keep this small so for now I'm going to take care of the easier case first

If anyone else wants to take a look at this I would be grateful since touching anything related to to_csv makes me a bit nervous 😬

TomAugspurger

Feel free to merge if you're comfortable @WillAyd.

WillAyd · 2018-10-12T22:24:04Z

Thanks @bnaul !

5amfung · 2019-08-16T05:56:40Z

@bnaul Looks like there's no documentation on how to use this feature. Hate to see this being implemented but no one is aware of it at all.

TomAugspurger · 2019-08-16T11:23:01Z

@5amfung can you submit a PR?

bnaul changed the title ~~[WIP] Support writing CSV to GCS~~ Support writing CSV to GCS Sep 14, 2018

gfyoung added Enhancement IO Data IO issues that don't fit into a more specific label labels Sep 14, 2018

bnaul force-pushed the gcsfs branch from 62e65f1 to 0e0ce70 Compare September 14, 2018 16:22

WillAyd reviewed Sep 15, 2018

View reviewed changes

WillAyd reviewed Sep 19, 2018

View reviewed changes

bnaul added 2 commits October 11, 2018 09:36

Support writing CSV to GCS

de1ab9f

Update whatsnew for writing to gcsfs

b2f97cc

bnaul force-pushed the gcsfs branch from 0e0ce70 to b2f97cc Compare October 11, 2018 16:53

TomAugspurger approved these changes Oct 12, 2018

View reviewed changes

WillAyd approved these changes Oct 12, 2018

View reviewed changes

WillAyd merged commit 241bde1 into pandas-dev:master Oct 12, 2018

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

Support writing CSV to GCS (pandas-dev#22704)

ca898e9

bnaul mentioned this pull request Jan 17, 2019

Google Bucket File System Support nteract/papermill#213

Closed

bnaul deleted the gcsfs branch January 17, 2019 18:21

Uh oh!

Support writing CSV to GCS #22704

Support writing CSV to GCS #22704

Uh oh!

Conversation

bnaul commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Sep 14, 2018

Uh oh!

WillAyd Sep 15, 2018

Choose a reason for hiding this comment

Uh oh!

bnaul Sep 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

WillAyd Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

bnaul Oct 11, 2018

Choose a reason for hiding this comment

Uh oh!

WillAyd Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

bnaul Oct 11, 2018

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Sep 19, 2018

Uh oh!

TomAugspurger commented Sep 19, 2018

Uh oh!

codecov bot commented Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bnaul commented Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Oct 12, 2018

Uh oh!

5amfung commented Aug 16, 2019

Uh oh!

TomAugspurger commented Aug 16, 2019

Uh oh!

Uh oh!

bnaul commented Sep 14, 2018 •

edited

Loading

bnaul Sep 15, 2018 •

edited

Loading

codecov bot commented Oct 11, 2018 •

edited

Loading

bnaul commented Oct 11, 2018 •

edited

Loading