Skip to content

ENH: Replace skiprows with skip_rows to begin standardizing underscore usage in keyword arguments #22587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

palewire
Copy link
Contributor

@palewire palewire commented Sep 4, 2018

By my rough count, the read_csv method has nearly 50 keyword arguments.

Of those, 32 arguments are made up or two or more words. Twenty of those multi-word arguments use an underscore to mark the space between words, like skip_blank_lines and parse_dates. Twelve do not, like chunksize and lineterminator.

It is my opinion this is a small flaw in pandas' API, and the library would benefit by standardizing how spaces are handled. It will make pandas more legible and consistent, and therefore easier for users of all experience levels.

Since the underscore method is more common and more legible, I propose it be adopted.

All existing arguments without an underscore will need to be modified. As a first salvo, I have attempted to change the skiprows kwarg to read_csv and its sibling methods to skip_rows. I have included an experimental deprecation warning that aims to continue to support the old argument for some interval into the future.

Due to my lack of expertise on pandas' internals, I expect this request includes some flaws that would need to be corrected before inclusion. However, I hope that the maintainers of the library will agree with the overall aims of this patch, which is to begin a process of introducing greater consistency to the style of keyword arguments.

If you do agree, I would be pleased to lead an effort to gradually standardize the inputs and put in the work to finish the job.

Thank you for your consideration.

@palewire palewire changed the title ENH: Replace skiprows kwarg with skip_rows to begin standardizing underscore usage in keyword arguments ENH: Replace skiprows with skip_rows to begin standardizing underscore usage in keyword arguments Sep 4, 2018
@WillAyd
Copy link
Member

WillAyd commented Sep 4, 2018

Haven't reviewed the change in detail but am rather neutral on the idea, maybe a slight -1. I'm a huge fan of consistency, though the thought of deprecating 12 parameters in one function alone adds a lot of code and headache from an end user perspective that I am not sure is worth it. Even the built in csv module has arguments like fieldnames, restkey and restval

Let's see what others think. In the future you'd be better served opening an issue first for the discussion to be had before going in and coding, though it sounds like this exercise was partially to familiarize yourself with the code as well so no big deal

@WillAyd WillAyd added IO CSV read_csv, to_csv Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Sep 4, 2018
@palewire
Copy link
Contributor Author

palewire commented Sep 4, 2018

I am sympathetic to the desire to maintain backwards compatibility, but the lack of consistency in syntax across pandas has gradually grown into a serious drawback.

I have taught pandas to dozens of newbies across the country and I can testify from experience that small variations in the naming style of commonly used methods introduces unnecessary frustration, and even reduces user confidence in the quality of the overall product.

As a frequent user of pandas, I can also attest that the inconsistencies require me, someone who uses the library daily, to routinely consult the documentation to ensure I use the proper kwarg naming style.

All of this can be avoided by selecting a style and sticking to it, just as authors of professional documents outside of programming do.

The deprecation warnings, if included, could be temporary, and ultimately removed in a future version, much in the way sort_values was introduced.

@asuozzo
Copy link

asuozzo commented Sep 4, 2018

Agreed, @palewire! I know a lot of the common two-word arguments for read_csv, but it's always frustrating to not be able to remember off the top of my head whether they have underscores.

@pallih
Copy link

pallih commented Sep 4, 2018

I think this is a fantastic proposal!

Pandas serves me greatly almost daily, but almost daily I find myself consulting the documentation, some of it may be attributed to my own shortcomings, but a part of it is this lack of standardization in Pandas APIs.

@emamd
Copy link

emamd commented Sep 5, 2018

Agreed with this proposal, I have the same issues as @pallih and @asuozzo. While it would be a headache to add these deprecations, I think increasing consistency would reduce the headache users have in remembering basic keyword arguments and speed up pandas use in general.

@eads
Copy link

eads commented Sep 5, 2018

At the cost of internal complexity, there could be a path forward where both styles are accepted and, in a later release, removed. But I struggle to remember which arguments have underscores and which don't. Personally, I'd happily revise old code that needs to work with newer versions of Pandas if it meant consistency in the keyword arguments over the long haul.

@gfyoung
Copy link
Member

gfyoung commented Sep 5, 2018

@palewire : Thanks for the contribution! Could you open an issue for this first (that you can then close with this PR <-- customary procedure here)? Given the impact, this would actually be much preferred for (re-)evaluating and discussing a proposal like this.

xref: #13349

@palewire
Copy link
Contributor Author

palewire commented Sep 5, 2018

@gfyoung, will do.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2018

@palewire can you list all of the kwargs that you are attempting to change.

I think some of them could be ok to actually deprecate and change (with a suitable deprecation period), but others are just not worth the effort and/or maybe just support both spelllings.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2018

@gfyoung I believe we have an issue for this anyhow?

@gfyoung
Copy link
Member

gfyoung commented Sep 7, 2018

@jreback : No, we don't. The closest was #13349.

@palewire
Copy link
Contributor Author

palewire commented Sep 7, 2018 via email

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2018

Closing this PR as stale. For any discussion please move to referenced issue

@WillAyd WillAyd closed this Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas IO CSV read_csv, to_csv Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants