Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: let's deprecate #18262

Closed
26 of 34 tasks
jreback opened this issue Nov 13, 2017 · 43 comments
Closed
26 of 34 tasks

DEPR: let's deprecate #18262

jreback opened this issue Nov 13, 2017 · 43 comments
Labels
Deprecate Functionality to remove in pandas Master Tracker High level tracker for similar issues

Comments

@jreback
Copy link
Contributor

jreback commented Nov 13, 2017

xref #18202

We have some cruft, let's deprecate it (I have noted some which already have an issue associated).

From ndarray

timeseries specific

  • Series.first()
    let head / tail take a timedelta
  • Series.last()

other

  • Series.swapaxes()
    • No reason once panel is gone.

non-controversial

Potentially

@jreback jreback added Deprecate Functionality to remove in pandas Master Tracker High level tracker for similar issues labels Nov 13, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 13, 2017
@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2017

cc @jorisvandenbossche @TomAugspurger if any comments / objections pls note and I will update the top section.

@TomAugspurger
Copy link
Contributor

I think I'm -1 on deprecating xs.

-0 on deprecating ptp

What's the alternative to tshift? I think that's sometimes useful when a shift won't quite work.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 13, 2017

I was just writing up a similar issue :-) (but only for Series, as by fixing the api docs and docstrings, I bumped into quite some methods unknown to me)

Overview of methods that could be considered for removal (note that this list is very long, and are methods that I would not miss if they are gone, which does not mean that they are not useful to others, it's just for stirring discussion):

  • Related to the original ndarray subclassing:

    • Series.compress()
    • Series.flags
    • Series.imag / Series.real
    • Series.item()
    • Series.itemsize
    • Series.nonzero()
    • Series.put()
    • Series.strides
    • Series.ptp
  • Time series specific ones (the question here is if they are all worth it as method, while very specific in application):

    • Series.at_time()
    • Series.first()
    • Series.last()
    • Series.between_time()
    • Series.tshift()
  • Finance? specific

    • Series.compound()
  • Other:

    • Series.asobject
    • Series.as_matrix
    • Series.between()
    • Series.first_valid_index() / Series.last_valid_index()
    • Series.from_array()
    • Series.slice_shift()
    • Series.swapaxes()
    • Series.truncate()
    • Series.valid()

@jorisvandenbossche
Copy link
Member

What's the alternative to tshift? I think that's sometimes useful when a shift won't quite work.

shift already seems to have a freq keyword as well, and it dispatches to tshift if freq is specified

Note that my above list is very long. The more obvious ones to me that are not yet in the list in the top post are: as_matrix (and maybe swapaxes ?)

@topper-123
Copy link
Contributor

Could .add_prefix and .add_suffix be added to the deprecation list?

The dataframe/Series namespace is huge and cutting down can make the API easier to grasp. I would also think it more logical and idiomatic to operate directly on the columns, rather than on the dataframe.

@jreback
Copy link
Contributor Author

jreback commented Nov 16, 2017

@topper-123 added, feel free to submit PR's for any of these!

@manrajgrover
Copy link
Contributor

@jorisvandenbossche @jreback I would love to submit PRs for this issue. How can I go about in deprecating these APIs?

@jreback
Copy link
Contributor Author

jreback commented Nov 16, 2017

see for example #18258

@jorisvandenbossche
Copy link
Member

@manrajgrover Best first post a comment here with which one you would start doing, as I think not all those listed above are uncontroversial.

@topper-123
Copy link
Contributor

topper-123 commented Nov 17, 2017

I think .compound, while not very useful ATM, could be more useful if it was cumulative, ie. just use .cumprod instead of .prod and return a series:

>>> s = pd.Series([0.2, 0.2, 0.2])
>>> s.compound()  # essentialy the same as (s+1).cumprod() - 1
0    0.2000
1    0.440
2    0.728
dtype: float64

The above would play excellently together with .pct_change, so data.pct_change().compound() would read really well and be very useful in many use cases.

any opinions if .compound could be changed like above rather than deprecated? If .compound returns a scalar as today, I agree it should be deprecated.

@topper-123
Copy link
Contributor

topper-123 commented Nov 17, 2017

I've started a PR for add_prefix and add_postfix.

I will take on Series.asobject and NDFrame.as_matrix next, unless @manrajgrover wants to to them, in which case you'll be welcome.

And yes, I like to remove superfluous methods that start with a, as these are so visible then tab-completing in the REPL :-)

@jorisvandenbossche
Copy link
Member

Although I never use add_prefix / add_suffix, I think they are quite used a bit (looking at the number of stackoverflow questions), so I am not yet fully convinced they are ok to deprecate.
So I would rather already start with the others like asobject, as_matrix, valid, tshift, ..

as_matrix is also used a bit (more than the others mentioned in the list above), but because it is a very confusing name for what it does, I think it would be good to deprecate.

@manrajgrover
Copy link
Contributor

@jorisvandenbossche I can start with Index.summary() for now and pick the next one from the following list:

@topper-123
Copy link
Contributor

@jorisvandenbossche, what about removing a prefix/suffix or doing any other transformation you'd want to do on a index? My point is that .add_prefix/.add_suffix are way too specialized methods, and pandas should have methods that are more generally useful. .rename is great in that respect, and should be the canonical method for changing axis values.

I've already made a proposal for .add_prefix/.add_suffix (#18347), so that can wait to see what the agreement will be on that. I would appreciate input though, if you see anything obvoíous, as that is the first deprerecation PR I've made.

In the same vein, the difference between .as_matrix and .values is miniscule to the point where df[columns].values is the same is df.as_matrix(columns). pandas will be cleaner and leaner with only one way to archieve such a common result (obviously so, IMO...).

I'll make a PR for as_matrix, as there seems to be agreement on that.

@jorisvandenbossche
Copy link
Member

My point is that .add_prefix/.add_suffix are way too specialized methods, and pandas should have methods that are more generally useful. .rename is great in that respect, and should be the canonical method for changing axis values.

I completely agree with this. But, you also have the fact that people are using it and thus a removal will cause inconvenience / break code. So it is always a balance between both.

I am certainly +1 on deprecating as_matrix. As you say this is almost exactly the same as .values, and although I think this method is also used quite a bit (the argument I use for add_prefix ..), it's an awfully confusing name, so that's for me an extra reason to deprecate it.

@tdpetrou
Copy link
Contributor

Here are my deprecation suggestions:

  • I'd like to see read_table deprecated. Its the exact same as read_csv with tab delimiter
  • remove get_dtype_counts/get_ftype_counts - these are just convenience for DataFrame.dtypes.value_counts
  • remove the indexers iat/at. They give a small performance boost for an increase in API complexity
  • Use one of iterrows/itertuples to iterate over rows
  • remove lookup and take - other indexers do the same thing
  • remove combine - never used it and almost no use on SO. Looks to do nothing more than DataFrame.add
  • Probably get rid of applymap - should do the same thing with apply and then map inside of it
  • Use only agg not aggregate
  • Remove clip_upper and clip_lower and keep DataFrame.clip for both
  • Combine add_prefix/add_suffix into one method
  • One of the biggest issues are the methods that work only with DataFrames with a DatetimeIndex - first, last, truncate, at_time, between_time, to_period, to_timestamp - These could be removed or put in an accessor
  • Remove reindex_axis and reindex_like in favor of just reindex
  • remove isna - its an alias to isnull

@jreback
Copy link
Contributor Author

jreback commented Nov 22, 2017

remove the indexers iat/at. They give a small performance boost for an increase in API complexity

these are convenience methods, not sure they add much to API burden

take

this is a very common notion and is a very array-like method

remove isna - its an alias to isnull

this was just added for compat with dropna, fillna, see the pattern :>, so if anything we would remove isnull, but that has been in the API so long that it may well nigh be impossible to actually remove (and more to the point very annoying).

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 23, 2017

Remove reindex_axis and reindex_like in favor of just reindex

reindex_axis is already deprecated.

One of the biggest issues are the methods that work only with DataFrames with a DatetimeIndex - first, last, truncate, at_time, between_time, to_period, to_timestamp - These could be removed or put in an accessor

to_period and to_timestamp on a series do something else than the methods in the .dt accessor. The former work on the index, the latter on the values. So it's not possible to just move them.
But on the other datetime-related I agree, I also find it a bit unfortunate that those exist (certainly first and last are very confusing in naming)

@tdpetrou
Copy link
Contributor

@jreback There are only a total of 13 occurrences of df.take in all of Stack Overflow and in my opinion should never be used.

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

I guess there is no going back on isna/isnull but I really dislike having methods that are aliases of one another.

@jorisvandenbossche I wasn't being clear, but all those DataFrame/Series methods that only work on DateTimeIndexes could be put in their own accesor (not .dt) but I don't think even that would be a good idea. Perhaps just deprecating all of them would be best.

@jreback
Copy link
Contributor Author

jreback commented Nov 23, 2017

@jreback There are only a total of 13 occurrences of df.take in all of Stack Overflow and in my opinion should never be used.

.take() is a common name for array-like things.

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

your suggestion is much less readable

I guess there is no going back on isna/isnull but I really dislike having methods that are aliases of one another.

sure, but isnull is even more entrenched than anything else.

@max-sixty
Copy link
Contributor

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

As indexing is simplified & improved, the speed diff between .loc and .at should fall (a lot of the time it's a very similar function). Then deprecating .at will cause less strife

@topper-123
Copy link
Contributor

DataFrames have a method .boxplot. I would assume this should be deprecated and people should use .plot.box instead?

@jreback
Copy link
Contributor Author

jreback commented Nov 26, 2017

yes i think there is an issue about this

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 26, 2017 via email

@jimmywan
Copy link

Why on earth would you deprecate read_table? That makes no damn sense.
The suggested change is to call read_csv to read things that are not comma-separated? This is 100% backwards.

@st-bender
Copy link

st-bender commented Feb 1, 2019

One could argue why not deprecate read_csv() instead of read_table() since table sounds more flexible.

Edit:
I have to agree with @jimmywan here, and if they are basically the same, why not at least keep it as an alias? One could always wrap it, but people would not be confused or avoid updating.

@ghost
Copy link

ghost commented Jun 26, 2019

DataFrame.where and DataFrame.mask are duals, but their names don't indicate that. perhaps deprecate mask? since mask is just where(~cond), IIUC. alternatively rename to where_not.

@jreback
Copy link
Contributor Author

jreback commented Sep 11, 2019

cc @jbrockmendel here is my original list :->

@WillAyd WillAyd mentioned this issue Sep 18, 2019
@alexiswl
Copy link

Would it be possible to get a rationale to why .item() has been deprecated or a suggested alternative? I really appreciated this feature as it would assume one and only one match to a query/column combo. i.e pd.query("Country=='%s' % country)['CapitalCity'].unique().item().

Using the suggest next(iter, None)) feature as suggested in the link below is very readable, but breaks the assumption that there are not multiple matches to a query and returns only the first value, meaning I'd need to do add a check-query-length==1 prior to extracting the value.

https://stackoverflow.com/questions/57390363/pandas-item-has-been-deprecated

@jreback
Copy link
Contributor Author

jreback commented Oct 27, 2019

.unique() currently returns an ndarray so .item() would still be valid

this is for .item() on Series (and Index)

@alexiswl
Copy link

Thanks! I'll put in unique() to all pd.query("row_name=='var')[col_name] as a midfix to resolve the deprecation warning.

@erfannariman
Copy link
Member

Do we want to deprecate iat / at? These are under "potentially" now. I would be +1, iloc and loc can be used.

@mroeschke
Copy link
Member

Appears that most of the functions described in the OP have been tackled. Further discussion on other specific APIs that should be deprecated would be better suited to individual issues. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Master Tracker High level tracker for similar issues
Projects
None yet
Development

No branches or pull requests