Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support returning the same dtype as the caller for window ops (including extension dtypes) #11446

Open
sandhujasmine opened this issue Oct 27, 2015 · 2 comments
Labels
Apply Apply, Aggregate, Transform, Map Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding

Comments

@sandhujasmine
Copy link

In sample code below rolling_apply takes an argument 'ix' which is a numpy array of dtype = 'int64' and by the time this array gets to get_type() function, its dtype has changed to 'float64'. I can make an explicit call in get_type() function to change this back: ix = ix.astype('int64'), but was curious why it gets changed.

Example below. I'm on version '0.17.0':

import numpy as np
import pandas as pd


def get_type(ix, df, hours):
    # invoked by rolling_apply to illustrate the problem
    # of rolling_apply changing the dtype of 'ix' array from
    # int64 to float64

    print ix.dtype

    # need to convert index dtype back to int64
    #ix = ix.astype('int64')

    ixv = ix[ix > -1]
    print ixv.dtype

    # the data in ix must be int64 else following fails with 
    # IndexError: arrays used as indices must be of integer (or boolean) type
    h = hours[ixv] - hours[ixv[0]]
    df.iloc[ix[-1]] = h[0]
    return 0.0


# we start out with ix.dtype = int64 but rolling_apply changes this to float64
ix = np.arange(0, 10)
hours = np.random.randint(0, 10, len(ix))
df = pd.DataFrame(np.random.randn(10, 1), columns=['h'])

pd.rolling_apply(ix, window=3, func=get_type, args=(df, hours,))

I also stepped through the code and believe I've identified the source of the problem. I thought I'd report it and see if others see this as an issue before trying to fix. Doing an explicit type change inside the get_type function as in this example also works.

The _process_data_structure() function turns this into a float.

Here's the logic that is explicitly changing the dtype to a float the first time. This can be omitted and the check updated to include 'float':

    if kill_inf and values.dtype == float:
        values = values.copy()
        values[np.isinf(values)] = np.NaN

However, the cython code that I assume does the rolling window, also expects a float64. In this case, maybe an option is to update the dtype after the call_cython function.

@jreback
Copy link
Contributor

jreback commented Oct 27, 2015

the rolling processors are implemented as float dtypes for simplicity. having ones for integers is not very useful as most rolling things do some sort of computations that end up as floats anyhow. Sure there are ones that don't (but in general we also need padding, meaning NaN's for these rolls).

You could cast back, but its not entirely for free though can be done safely (e.g. see here

so if you are interested in a PR for this, would take it.

Note that rolling functions are not super friendly, see #8659 , so would love some contributions here!

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 27, 2015
@jreback jreback added this to the Next Major Release milestone Oct 27, 2015
@sandhujasmine
Copy link
Author

Thanks Jeff - I browsed though the issues #8659 and submitted a PR for #4964. I need a little help completing the documentation since this is my first time contributing to pandas - I'll look for you comments on the PR to complete it. Thanks!

@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map Window rolling, ewma, expanding and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 27, 2019
@mroeschke mroeschke changed the title Why does rolling_apply change the dtype of the array it is rolling? ENH: Support returning the same dtype as the caller for window ops (including extension dtypes) Sep 1, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

4 participants