Skip to content

Inconsistent behaviour on sep/delimiter for pandas.read_csv #21996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dahlbaek opened this issue Jul 20, 2018 · 7 comments
Open

Inconsistent behaviour on sep/delimiter for pandas.read_csv #21996

dahlbaek opened this issue Jul 20, 2018 · 7 comments
Labels
API - Consistency Internal Consistency of API/Behavior Deprecate Functionality to remove in pandas IO CSV read_csv, to_csv

Comments

@dahlbaek
Copy link
Contributor

Related: #7662

Code Sample, a copy-pastable example if possible

from io import StringIO

import pandas as pd


CSV = "a|b\n1|2"
print(pd.read_csv(StringIO(CSV), sep=None, engine='python'))
print('is not the same as')
print(pd.read_csv(StringIO(CSV), delimiter=None, engine='python'))
print('\nand no warning is emitted by')
print(pd.read_csv(StringIO(CSV), sep='|', delimiter=' '))

Problem description

According to the documentation,

delimiter : str, default None
    Alternative argument name for sep.

Thus, I would expect sep and delimiter to be interchangable.

Expected Output

Specifying delimiter=None is equivalent to specifying sep=None, and specifying both sep and delimiter emits a warning or causes an error. Alternatively, either sep or delimiter should be deprecated.

Output of pd.show_versions()

No module named 'dask'

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_DK.UTF-8
LOCALE: en_DK.UTF-8

pandas: 0.24.0.dev0+332.g1f6ddc4
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.7
IPython: 6.4.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.9
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.1
@WillAyd WillAyd added IO CSV read_csv, to_csv Deprecate Functionality to remove in pandas labels Jul 20, 2018
@WillAyd
Copy link
Member

WillAyd commented Jul 20, 2018

I'd be +1 to deprecate delimiter and just keep sep - let's see what others think.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

i believe we have this discussion and kept this because it’s compatible with other languages

there might be an open issue about this

but sure would be ok to deprecate

@WillAyd WillAyd added this to the Contributions Welcome milestone Jul 21, 2018
@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 21, 2018

The only reason I might not be in favor of deprecating delimiter is that it is supported by the Frictionless Data CSV Dialect, as well as csv.Dialect. On the other hand sep seems to be the favored name in pandas, so deprecating sep might also not be a good idea…

@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2018

There's some overlap with the dialect but obviously we don't align on all the keywords and naming conventions thereof. I'd still stick with sep given it's usage in pandas and there fact that there's no delimiter in to_csv

@minggli
Copy link
Contributor

minggli commented Oct 14, 2018

happy to work on this.

@jorisvandenbossche
Copy link
Member

I think we should decide on deprecating delimeter or not together we the other discussion on aliases with _, as one of the arguments there to keep old names is the consistency with csv module. If we do that there, I think it also applies on the delimiter keyword.

We can also simply add to the docstring that delimiter does not support the "sniffing feature".

(which is a bit an esoteric feature anyhow IMO)

@jorisvandenbossche
Copy link
Member

The other issue I mean is: #22639

@jbrockmendel jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Sep 21, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Deprecate Functionality to remove in pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants