-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added the Boss-SP algorithm and its example. Tests are still missing. #156
base: main
Are you sure you want to change the base?
Conversation
Not a pull request meant to be accepted, but a pull request meant to share code and get feedback |
pyts/classification/bosssp.py
Outdated
level : integer (default = 3) | ||
Number of times the series is being divided. Maximum of 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of the parameters is relevant: the parameters are usually sorted based on their relevance / importance. level
is a key element of BOSSSP (compared to BOSS), so it should be higher on the list (above window_step
I would say).
pyts/classification/bosssp.py
Outdated
anova=False, drop_sum=False, norm_mean=False, norm_std=False, | ||
strategy='quantile', alphabet=None, | ||
numerosity_reduction=True, use_idf=True, smooth_idf=False, | ||
sublinear_tf=True, level=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
pyts/classification/bosssp.py
Outdated
self.use_idf = use_idf | ||
self.smooth_idf = smooth_idf | ||
self.sublinear_tf = sublinear_tf | ||
self.level = level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
pyts/classification/bosssp.py
Outdated
y[:length//2]) | ||
words_3 = self._boss_word_extractor(X[length//2:], | ||
y[:length//2:]) | ||
words = words + words_2 + words_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are concatenated in BOSSSP are the histograms (and not the list of words). For instance, if level=3
, the final histogram is the concatenation of 7 histograms. You need to first compute the histogram for each words_i
list, then concatenate the histogram, instead of concatenating first the words_i
lists and then computing the histogram.
pyts/classification/bosssp.py
Outdated
else: | ||
X_bow = np.asarray([' '.join(X_word[i]) for i in range(n_samples)]) | ||
|
||
words_list = np.asarray(X_bow[0].split(' ')).tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only get the transform for the first time series (X_bow[0]
).
pyts/classification/bosssp.py
Outdated
def _check_params(self, n_timestamps): | ||
if not isinstance(self.word_size, (int, np.integer)): | ||
raise TypeError("'word_size' must be an integer.") | ||
if not self.word_size >= 1: | ||
raise ValueError("'word_size' must be a positive integer.") | ||
|
||
if not isinstance(self.window_size, | ||
(int, np.integer, float, np.floating)): | ||
raise TypeError("'window_size' must be an integer or a float.") | ||
if isinstance(self.window_size, (int, np.integer)): | ||
if self.drop_sum: | ||
if not 1 <= self.window_size <= (n_timestamps - 1): | ||
raise ValueError( | ||
"If 'window_size' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"(n_timestamps - 1) if 'drop_sum=True'." | ||
) | ||
else: | ||
if not 1 <= self.window_size <= n_timestamps: | ||
raise ValueError( | ||
"If 'window_size' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"n_timestamps if 'drop_sum=False'." | ||
) | ||
window_size = self.window_size | ||
else: | ||
if not 0 < self.window_size <= 1: | ||
raise ValueError( | ||
"If 'window_size' is a float, it must be greater " | ||
"than 0 and lower than or equal to 1." | ||
) | ||
window_size = ceil(self.window_size * n_timestamps) | ||
|
||
if not isinstance(self.window_step, | ||
(int, np.integer, float, np.floating)): | ||
raise TypeError("'window_step' must be an integer or a float.") | ||
if isinstance(self.window_step, (int, np.integer)): | ||
if not 1 <= self.window_step <= n_timestamps: | ||
raise ValueError( | ||
"If 'window_step' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"n_timestamps." | ||
) | ||
window_step = self.window_step | ||
else: | ||
if not 0 < self.window_step <= 1: | ||
raise ValueError( | ||
"If 'window_step' is a float, it must be greater " | ||
"than 0 and lower than or equal to 1." | ||
) | ||
window_step = ceil(self.window_step * n_timestamps) | ||
|
||
if self.drop_sum: | ||
if not self.word_size <= (window_size - 1): | ||
raise ValueError( | ||
"'word_size' must be lower than or equal to " | ||
"(window_size - 1) if 'drop_sum=True'." | ||
) | ||
else: | ||
if not self.word_size <= window_size: | ||
raise ValueError( | ||
"'word_size' must be lower than or equal to " | ||
"window_size if 'drop_sum=False'." | ||
) | ||
|
||
if not isinstance(self.level, (int, np.integer)): | ||
raise TypeError("'level' must be an integer.") | ||
if not self.level >= 1: | ||
raise ValueError("'level' must be a positive integer.") | ||
if self.level >= 4: | ||
raise ValueError("'level' must not exceed 3.") | ||
|
||
return window_size, window_step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, it might be easier to use the corresponding of method of BOSS
if the code is the same. You could just add the checks for level
.
pyts/classification/bosssp.py
Outdated
words_2 = self._boss_word_extractor(X[:length//2], | ||
y[:length//2]) | ||
words_3 = self._boss_word_extractor(X[length//2:], | ||
y[:length//2:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You split your dataset in two, but on the first axis (the axis of samples) instead of the second axis (the time axis).
pyts/classification/bosssp.py
Outdated
words_4 = self._boss_word_extractor(X[:length//4], | ||
y[:length//4]) | ||
words_5 = self._boss_word_extractor(X[length//4:length//2], | ||
y[length//4:length//2]) | ||
words_6 = self._boss_word_extractor(X[length//2:length*3//4], | ||
y[length//2:length*3//4]) | ||
words_7 = self._boss_word_extractor(X[length*3//4:], | ||
y[length*3//4:]) | ||
words = words + words_4 + words_5 + words_6 + words_7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
if not isinstance(self.window_size, | ||
(int, np.integer, float, np.floating)): | ||
raise TypeError("'window_size' must be an integer or a float.") | ||
if isinstance(self.window_size, (int, np.integer)): | ||
if self.drop_sum: | ||
if not 1 <= self.window_size <= (n_timestamps - 1): | ||
raise ValueError( | ||
"If 'window_size' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"(n_timestamps - 1) if 'drop_sum=True'." | ||
) | ||
else: | ||
if not 1 <= self.window_size <= n_timestamps: | ||
raise ValueError( | ||
"If 'window_size' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"n_timestamps if 'drop_sum=False'." | ||
) | ||
window_size = self.window_size | ||
else: | ||
if not 0 < self.window_size <= 1: | ||
raise ValueError( | ||
"If 'window_size' is a float, it must be greater " | ||
"than 0 and lower than or equal to 1." | ||
) | ||
window_size = ceil(self.window_size * n_timestamps) | ||
|
||
if not isinstance(self.window_step, | ||
(int, np.integer, float, np.floating)): | ||
raise TypeError("'window_step' must be an integer or a float.") | ||
if isinstance(self.window_step, (int, np.integer)): | ||
if not 1 <= self.window_step <= n_timestamps: | ||
raise ValueError( | ||
"If 'window_step' is an integer, it must be greater " | ||
"than or equal to 1 and lower than or equal to " | ||
"n_timestamps." | ||
) | ||
window_step = self.window_step | ||
else: | ||
if not 0 < self.window_step <= 1: | ||
raise ValueError( | ||
"If 'window_step' is a float, it must be greater " | ||
"than 0 and lower than or equal to 1." | ||
) | ||
window_step = ceil(self.window_step * n_timestamps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this method is run in the _boss_word_extractor
method, if the values of window_size
and window_step
are floats, the proportions are computed on the split datasets, which means that the values of window_size
and window_step
would be different for different levels. It might not be an issue in itself, but in this case it should probably be mentioned in the documentation.
plt.ylabel("Occurences", fontsize=14) | ||
plt.title("Number of occurence of each word after BOSS-SP transformation", | ||
fontsize=15) | ||
plt.legend(loc='best') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have no label
in your plot, so there is no legend to display.
…t for the ones regarding code similar from other python modules as discussed orally
Corrected the issues mentioned in the pull request feedback, except for the ones regarding code similar from other python modules as discussed orally |
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
@SvenBarray, would you like to add this to Not many changes have to be made:
|
No description provided.