Skip to content

Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used #17810

@MartinThoma

Description

@MartinThoma

Code Sample, a copy-pastable example if possible

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import pandas as pd

csv_path = 'test.csv'
df = pd.read_csv(csv_path, delimiter=';', quotechar='"',
                 decimal=',', encoding="ISO-8859-1", dtype={'FOO': str})
df.FOO = df.FOO.map(lambda n: n.zfill(6))
print(df)

test.csv:

FOO;BAR
01,23;4,56
1,23;45,6
;987

Problem description

When I use dtype={'FOO': str}, I expect pandas to treat the column as a string. This seems to work, but when an empty cell is present Pandas seems to switch to float.

Expected Output

      FOO     BAR
0  001,23    4.56
1  001,23   45.60
2  000000  987.00

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.2.2
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: 1.1.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions