Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'friedrich_coefficients' and 'max_langevin_fixed_point' do not work with a single record DataFrame #929

Open
momijiame opened this issue Feb 25, 2022 · 3 comments
Labels

Comments

@momijiame
Copy link

I am a newbie of tsfresh, so sorry if I misunderstood something.

The problem:

I encountered an exception in the following tutorial.

"Rolling/Time series forecasting" https://tsfresh.readthedocs.io/en/latest/text/forecasting.html

The reproduction procedure is as follows.
Just input the snipets in the tutorial in order at the prompt.

  1. Launch Python interpreter
$ python3
  1. Define a DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame({
...    "id": [1, 1, 1, 1, 2, 2],
...    "time": [1, 2, 3, 4, 8, 9],
...    "x": [1, 2, 3, 4, 10, 11],
...    "y": [5, 6, 7, 8, 12, 13],
... })
  1. Extract a rolling DataFrame
>>> from tsfresh.utilities.dataframe_functions import roll_time_series
>>> df_rolled = roll_time_series(df, column_id="id", column_sort="time")
  1. Extract features from a rolling DataFrame
>>> from tsfresh import extract_features
>>> df_features = extract_features(df_rolled, column_id="id", column_sort="time")

The following exception are raised at (4.) operation.

Feature Extraction:   0%|                                                                                            | 0/12 [00:01<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 42, in <genexpr>
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 364, in _f
    result = func(x, param=parameter_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 2103, in friedrich_coefficients
    calculated[m][r] = _estimate_friedrich_coefficients(x, m, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 152, in _estimate_friedrich_coefficients
    df["quantiles"] = pd.qcut(df.signal, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/pandas/core/reshape/tile.py", line 376, in qcut
    bins = np.quantile(x_np, quantiles)
  File "<__array_function__ internals>", line 5, in quantile
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3979, in quantile
    return _quantile_unchecked(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3564, in _ureduce
    r = func(a, **kwargs)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4109, in _quantile_ureduce_func
    x_below = take(ap, indices_below, axis=0)
  File "<__array_function__ internals>", line 5, in take
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 164, in extract_features
    result = _do_extraction(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 294, in _do_extraction
    result = distributor.map_reduce(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 241, in map_reduce
    result = list(itertools.chain.from_iterable(result))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 42, in <genexpr>
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 364, in _f
    result = func(x, param=parameter_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 2103, in friedrich_coefficients
    calculated[m][r] = _estimate_friedrich_coefficients(x, m, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 152, in _estimate_friedrich_coefficients
    df["quantiles"] = pd.qcut(df.signal, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/pandas/core/reshape/tile.py", line 376, in qcut
    bins = np.quantile(x_np, quantiles)
  File "<__array_function__ internals>", line 5, in quantile
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3979, in quantile
    return _quantile_unchecked(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3564, in _ureduce
    r = func(a, **kwargs)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4109, in _quantile_ureduce_func
    x_below = take(ap, indices_below, axis=0)
  File "<__array_function__ internals>", line 5, in take
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

Anything else we need to know?:

I investigated why the above issue arise.
And I found out that the cause was the calculation of 'friedrich_coefficients' and 'max_langevin_fixed_point'.
Since if both calculations are removed from FC settings, the exception will not be raised.

import pandas as pd
df = pd.DataFrame({
    "id": [1, 1, 1, 1, 2, 2],
    "time": [1, 2, 3, 4, 8, 9],
    "x": [1, 2, 3, 4, 10, 11],
    "y": [5, 6, 7, 8, 12, 13],
})
from tsfresh.utilities.dataframe_functions import roll_time_series
df_rolled = roll_time_series(df, column_id="id", column_sort="time")

# drop features 'friedrich_coefficients' and 'max_langevin_fixed_point' from FC settings
from tsfresh.feature_extraction import ComprehensiveFCParameters
settings = ComprehensiveFCParameters()
del settings['friedrich_coefficients']
del settings['max_langevin_fixed_point']

# extract features with FC settings
from tsfresh import extract_features
df_features = extract_features(df_rolled,
                               column_id="id",
                               column_sort="time",
                               default_fc_parameters=settings,
                               )

I also realized that these calculations do not support a single record DataFrame.
For example, let's take the first (t=1) rolled DataFrame and give it.
This raises the same exception as before.

>>> df_rolled.iloc[0:1]
       id  time  x  y
7  (1, 1)     1  1  5
>>> df_features = extract_features(df_rolled.iloc[0:1],
...                                column_id="id",
...                                column_sort="time",
...                                )

However, the exception will not be raised for the next (t=2) rolled DataFrame.

>>> df_rolled.iloc[1:3]
        id  time  x  y
9   (1, 2)     1  1  5
10  (1, 2)     2  2  6
>>> df_features = extract_features(df_rolled.iloc[1:3],
...                                column_id="id",
...                                column_sort="time",
...                                )

This behavior does not occur in other calculations.

Environment:

  • Python version: 3.9.10
  • Operating System: macOS 12.2.1
  • tsfresh version: 0.19.0
  • Install method (conda, pip, source): pip
@momijiame momijiame added the bug label Feb 25, 2022
@mdhanna
Copy link

mdhanna commented Aug 16, 2022

I was experiencing the same issue with Python 3.8. I downgraded to Python 3.7 and have been able to execute the same code successfully.

@momijiame
Copy link
Author

Thank you for the valuable information. Apparently, this problem depends on the version of pandas. If downgrading Python to 3.7, the version of pandas will be older (< 1.4). In other words, Python version 3.8 or later will also work if the pandas version is less than 1.4.

  1. The following environment is not worked:
$ python -V            
Python 3.9.13
$ pip list | grep pandas
pandas             1.4.3
$ pip list | grep numpy
numpy              1.21.5
  1. Downgrade the version of pandas:
$ pip install -U "pandas<1.4"
  1. The following environment is worked:
$ python -V            
Python 3.9.13
$ pip list | grep pandas
pandas             1.3.5
$ pip list | grep numpy 
numpy              1.21.5

@paulbauriegel
Copy link

You can add a small Exception clause for the IndexError under the existing one in the _estimate_friedrich_coefficients function to "solve" the problem.

    try:
        df["quantiles"] = pd.qcut(df.signal, r)
    except ValueError:
        return [np.NaN] * (m + 1)
    except IndexError:
        return [np.NaN] * (m + 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants