Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve freq argument in matrix's freq argument #106

Open
wangsen992 opened this issue May 25, 2020 · 7 comments
Open

Improve freq argument in matrix's freq argument #106

wangsen992 opened this issue May 25, 2020 · 7 comments
Labels

Comments

@wangsen992
Copy link

This is probably a very simple bug to fix. If nobody tackles it I will probably do it some time when I'm free and give a pull request. The error is that the freq argument from msno.matrix(), when activated in missingno.py (the code is shown below), it initiates a date_range starting from the beginning of the day. When it does not get a index value from df.index.get_loc(value), the KeyError is catched and the operation is halted.

The issue is many times this timeseries data might not begin or end on a full day cycle, aka 00:00 am. So maybe simply cut off with the range of input df will solve the problem.

if freq:
        ts_list = []

        if type(df.index) == pd.PeriodIndex:
            ts_array = pd.date_range(df.index.to_timestamp().date[0],
                                     df.index.to_timestamp().date[-1],
                                     freq=freq).values

            ts_ticks = pd.date_range(df.index.to_timestamp().date[0],
                                     df.index.to_timestamp().date[-1],
                                     freq=freq).map(lambda t:
                                                    t.strftime('%Y-%m-%d'))

        elif type(df.index) == pd.DatetimeIndex:
            ts_array = pd.date_range(df.index.date[0], df.index.date[-1],
                                     freq=freq).values

            ts_ticks = pd.date_range(df.index.date[0], df.index.date[-1],
                                     freq=freq).map(lambda t:
                                                    t.strftime('%Y-%m-%d'))
        else:
            raise KeyError('Dataframe index must be PeriodIndex or DatetimeIndex.')
        try:
            for value in ts_array:
                ts_list.append(df.index.get_loc(value))
        except KeyError:
            raise KeyError('Could not divide time index into desired frequency.')

PS: Hopefully the format of the issue is clear. This is my first time to raise issue so any suggestion on modifying this issue would be welcomed.

And great work with this project!

@wangsen992
Copy link
Author

Actually I think by simply putting the try-except clause inside the for loop might just work.

for value in ts_array:
    try:
        ts_list.append(df.index.get_loc(value))
    except KeyError:
        logging.warning('Could not divide time index into desired frequency.')

Something like that without breaking the for-loop.

@ResidentMario
Copy link
Owner

If you go ahead and submit a PR I'm happy to take a look at that. :)

@xxl4tomxu98
Copy link

What is the status of this issue? I just installed the package and obviously this bug is still NOT fixed?

@ResidentMario
Copy link
Owner

This bug probably still exists. I didn't look at freq the last time I did an OSS maintenance day, I'll try to look at it the next time I have time.

@heyej
Copy link

heyej commented Oct 14, 2021

If someone has this problem and cannot cut off their timeseries (gaps between days), another solution could be to reindex time series with a complete range of dates (hh:mm:ss as necessary) and fill the value gaps with NaN.

@maubere-tls
Copy link

maubere-tls commented Apr 19, 2022

Try removing the .values from the code.

This is probably a very simple bug to fix. If nobody tackles it I will probably do it some time when I'm free and give a pull request. The error is that the freq argument from msno.matrix(), when activated in missingno.py (the code is shown below), it initiates a date_range starting from the beginning of the day. When it does not get a index value from df.index.get_loc(value), the KeyError is catched and the operation is halted.

The issue is many times this timeseries data might not begin or end on a full day cycle, aka 00:00 am. So maybe simply cut off with the range of input df will solve the problem.

if freq:
        ts_list = []

        if type(df.index) == pd.PeriodIndex:
            ts_array = pd.date_range(df.index.to_timestamp().date[0],
                                     df.index.to_timestamp().date[-1],
                                     freq=freq).values

            ts_ticks = pd.date_range(df.index.to_timestamp().date[0],
                                     df.index.to_timestamp().date[-1],
                                     freq=freq).map(lambda t:
                                                    t.strftime('%Y-%m-%d'))

        elif type(df.index) == pd.DatetimeIndex:
            ts_array = pd.date_range(df.index.date[0], df.index.date[-1],
                                     freq=freq).values

            ts_ticks = pd.date_range(df.index.date[0], df.index.date[-1],
                                     freq=freq).map(lambda t:
                                                    t.strftime('%Y-%m-%d'))
        else:
            raise KeyError('Dataframe index must be PeriodIndex or DatetimeIndex.')
        try:
            for value in ts_array:
                ts_list.append(df.index.get_loc(value))
        except KeyError:
            raise KeyError('Could not divide time index into desired frequency.')

PS: Hopefully the format of the issue is clear. This is my first time to raise issue so any suggestion on modifying this issue would be welcomed.

And great work with this project!

Try removing the .values from the code.

@HemalathaRamanujam2022
Copy link

Hi,

My index on the dataframe has the value in "yyyy-mm-dd hh:mi:ss" format and each row is at 15 min interval. Can you tell me how to use the frequency parameter on the matrix plot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants