Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_table with full line comment and delim_whitespace=True #8115

Closed
jmkuhn opened this issue Aug 26, 2014 · 9 comments · Fixed by #8122
Closed

BUG: read_table with full line comment and delim_whitespace=True #8115

jmkuhn opened this issue Aug 26, 2014 · 9 comments · Fixed by #8122
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@jmkuhn
Copy link

jmkuhn commented Aug 26, 2014

I expect the following to produce a DataFrame with 3 rows, 5 columns and no NaNs. Instead it produces 5 rows with the comment lines filled with NaNs. Pandas version 0.14.1.

In [2]: !cat test.txt
# comment
 0  1  2  3  4
 5  6  7  8  9
# comment
10 11 12 13 15

In [3]: df = pd.read_table("test.txt", skipinitialspace=True,
   ...: names=["A", "B", "C", "D", "E"], delim_whitespace=True, comment="#")

In [4]: df
Out[4]: 
    A   B   C   D   E
0 NaN NaN NaN NaN NaN
1   0   1   2   3   4
2   5   6   7   8   9
3 NaN NaN NaN NaN NaN
4  10  11  12  13  15
@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

iirc this is only implemted in master (not 0.14.1) as it is an API change for a couple of reasons

pls check

@jmkuhn
Copy link
Author

jmkuhn commented Aug 26, 2014

I just installed from master (0.14.1.dev) and the behavior of my example has not changed.

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

not merged yet
see #7470

@jorisvandenbossche
Copy link
Member

@jreback It's a bit confusing, as we split the comment part of that PR #7470 into a new PR, which is included in 0.14.1: #7582

There seems to be a bug in comment in combination with delim_whitespace, if I use sep it does work:

In [13]: pd.read_csv(StringIO(s), skipinitialspace=True,
    ...:               names=["A", "B", "C", "D", "E"], delim_whitespace=True, comment="#")
Out[13]: 
    A   B   C   D   E
0 NaN NaN NaN NaN NaN
1   0   1   2   3   4
2   5   6   7   8   9
3 NaN NaN NaN NaN NaN
4  10  11  12  13  15

In [15]: pd.read_csv(StringIO(s), sep=' ', skipinitialspace=True,
    ...:               names=["A", "B", "C", "D", "E"], comment="#")
Out[15]: 
    A   B   C   D   E
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  15

@jreback
Copy link
Contributor

jreback commented Aug 27, 2014

the associated pr is not merged yet
I closed it because their is already an issue
u can leave this but make sure the pr then closes it

@jorisvandenbossche
Copy link
Member

Which PR is not merged? #7582 is merged, and following the whatsnew entry in that PR, the above example should work

@jreback
Copy link
Contributor

jreback commented Aug 27, 2014

#7470 it was split out

@jreback
Copy link
Contributor

jreback commented Aug 27, 2014

cc @AmrAS1

@mdmueller
Copy link
Contributor

Oh, I see. This isn't an issue with the PR splitup, it actually is a bug--the C tokenizer uses a different function for whitespace-delimited reading, and evidently I forgot to change line comment behavior there. This should be easily fixable, I'll start a new PR to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants