-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bin Edges Are all zero - Value Error #101
Comments
I ran into the problem of "constants" as well when I first started using this library. This is to do with the self checks where the library evaluates consecutive constants as an error. The issue can be resolved by reducing the number of bins, but that is not practical in most cases. The solution is to edit the discretizer.py in "site-packages\pyts\preprocessing" under the library location. Find the following line:
and comment it out. That should resolve your problem. |
Hi I tried commenting out the line of code you mentioned but I still get the error. It is a ValueError from the following function: def _compute_bins(self, X, y, n_timestamps, n_bins, strategy):
if strategy == 'normal':
bins_edges = norm.ppf(np.linspace(0, 1, self.n_bins + 1)[1:-1])
elif strategy == 'uniform':
timestamp_min, timestamp_max = np.min(X, axis=0), np.max(X, axis=0)
bins_edges = _uniform_bins(timestamp_min, timestamp_max,
n_timestamps, n_bins)
elif strategy == 'quantile':
bins_edges = np.percentile(
X, np.linspace(0, 100, self.n_bins + 1)[1:-1], axis=0
).T
if np.any(np.diff(bins_edges, axis=0) == 0):
raise ValueError(
"At least two consecutive quantiles are equal. "
"Consider trying with a smaller number of bins or "
"removing timestamps with low variation."
)
else:
bins_edges = self._entropy_bins(X, y, n_timestamps, n_bins)
return bins_edges |
Something interesting I found out is that when I change the strategy to uniform it seems to work.... I didn't need to commented out the line you suggested above. Also, all strategies work except the quantile which throws the error.... transformer = WEASELMUSE(strategy='uniform',word_size=4, window_sizes=np.arange(5, 105)) |
Hi, First of all, I would like to mention that variable length time series are unfortunately badly supported in this library for the moment. The reasons for this are twofold: (i) algorithms introduced in their original papers were rarely meant to deal with variable length time series (because of the lack of such data sets in the UCR Time Series Classification repository) and I wanted to implement the algorithms as they were described in the papers, and (ii) it's obviously easier and more efficient to work with fixed length time series using NumPy arrays. Therefore, padding shorter time series with a fixed value is likely to introduce some issues. More specifically on the
An error is raised when two back-to-back bin edges are equal, because in this case a bin is empty ( Now let's have a look at the different strategies to compute the bin widths:
With Going back to your use case, this is where zero padding becomes an issue. You may have some subsequences that contain only zeros. All the Fourier coefficients will also be equal to zero. And if you have many subsequences that contain only zeros, then you will have a feature (i.e. a Fourier coefficient) that will have many zeros, and the aforementioned issue will occur. Hope this helps you a bit to understand the reasoning of what's going on under the hood. |
Thanks for the clear explanation @johannfaouzi. So the fix seems to be that we lower the number of bins with It will be great to see the strategies working with variable length time-series data, as these occur in majority of the use-cases where time-series data is involved. |
Description
I have data in the same format used for load_basic_motions(return_X_y=True)
But when I created my data set I had to pad some time series with zeros
This made the length all the same and I ended up with a ndarray fo X_train shape of (177,12,111) and y_train hape of 177
When I run clf.fit I get a ValueError that says the following:
the Bin edges seem to all be zero
Is this because of the padding?
Steps/Code to Reproduce
Versions
NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0
The text was updated successfully, but these errors were encountered: