-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Type inference problem with read_csv #9669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
xref is #3866 something is throwing the inference engine off. After its read in, you could do |
@jreback I just tried the convert_objects call and it works. I've manually inspected the CSV file hunting for hidden characters and have seen nothing. I found a similar problem on Stack Overflow that suggests this problem has been around for a bit. All the evidence so far points to a deeper issue. |
@diehl your pointed to issue is really completely different, though if you DID have actual string-likes then I suppose it could be the same (though it IS really hard to inspect visually for this kind of thing). That's why I suggested you iterate in read_csv using chunksize=1000 or something, and narrow down which chunk gives this error. OR if it doesn't then we can discuss a bug from there. |
@jreback false alarm. after more digging I found the issue with the file. thanks for the feedback. |
np |
I have a large CSV file that contains a single column of integers. When loading the CSV file with read_csv, nearly 2/3rds of the approximately 1.5 million values are loaded as ints while the remaining values are loaded as strings. I see no obvious problem with the file that would lead to this behavior.
The file I used is available here: https://drive.google.com/file/d/0ByZvgdTf0yfAT2dPdHRvc2hLVkU/view?usp=sharing
It appears that a continguous block of rows were read as strings - a block in the middle of the file. Very curious.
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.15.2
nose: 1.3.4
Cython: 0.21
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.5.0
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 2.1.0
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: