Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas version 0.23.1 failed to save dataframe onto HDFS with hdfs cli #21560

Closed
jiyuan312986471 opened this issue Jun 20, 2018 · 6 comments
Closed
Labels
IO CSV read_csv, to_csv IO HDF5 read_hdf, HDFStore
Milestone

Comments

@jiyuan312986471
Copy link

Problem description

I'm using Pandas for data transformation and the I/O is HDFS(by using HdfsCLI).

When I use version 0.23.1, the to_csv function gives me AttributeError:

Traceback (most recent call last):
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 450, in <module>
    main(args)
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 402, in main
    encoding=encoding)
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 367, in save_csv
    header=False)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 1745, in to_csv
    formatter.save()
  File "/usr/local/lib/python3.5/site-packages/pandas/io/formats/csvs.py", line 167, in save
    f.close()
AttributeError: 'AsyncWriter' object has no attribute 'close'

Where AsyncWriter is a writer class of HdfsCLI.

Here's the code

with cli_hdfs.write(output_path, encoding=encoding, overwrite=True) as writer:
    df.to_csv(writer, sep=sep, index=False, encoding=encoding, header=False)

Note: When I downgrade my pandas to 0.23.0 the problem is solved.

I have googled a lot without finding anything useful. So I came here to open the issue because it seems a bug to me.

@Liam3851
Copy link
Contributor

Can you please test against master? This appears to be another form of issue #21471 (bugs in to_csv when writing to a file handle or file-like object), which is fixed in master and slated for 0.23.2. See PR #21478.

@gfyoung gfyoung added IO CSV read_csv, to_csv IO HDF5 read_hdf, HDFStore labels Jun 22, 2018
@rupesh121
Copy link

rupesh121 commented Jun 22, 2018

Hey, Even i am facing this issue. I am using python 3.6.5.

Did u figure out how to resolve the issue.

And, how did u downgrade pandas version to 0.23.0? Pandas version that i have is 0.23.1

Please help me

@jiyuan312986471
Copy link
Author

@Liam3851 Sorry for my late response. I will try to test it but it will be a little bit complicated for my working environment.

@jiyuan312986471
Copy link
Author

@rupesh121 just do pip install pandas==0.23.0

@jorisvandenbossche jorisvandenbossche added this to the 0.23.2 milestone Jun 26, 2018
@jorisvandenbossche
Copy link
Member

We are assuming this will be fixed in the 0.23.2 release, but if somebody can actually test it (with the development version of pandas), that would be very welcome.

@jreback jreback modified the milestones: 0.23.2, 0.23.3 Jun 26, 2018
@jreback jreback modified the milestones: 0.23.4, 0.23.5 Aug 2, 2018
@jiyuan312986471
Copy link
Author

Recently I have tested on my working environment with released 0.23.2, 0.23.3 and 0.23.4. All work like a charm. Thank you guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

6 participants