-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add STLDecomposer to multiseries pipelines #4299
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #4299 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 357 357
Lines 39577 39587 +10
=======================================
+ Hits 39457 39467 +10
Misses 120 120
☔ View full report in Codecov by Sentry. |
@@ -806,10 +806,11 @@ def graph(self, name=None, graph_format=None): | |||
[ | |||
key + " : " + "{:0.2f}".format(val) | |||
if (isinstance(val, float)) | |||
else key + " : " + str(val) | |||
else key + " : " + str(val).replace("{", "").replace("}", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to follow 😅 can you add an explanatory comment?
@@ -442,6 +442,7 @@ def inverse_transform( | |||
y.append(y_series) | |||
y_df = pd.DataFrame(y).T | |||
y_df.index = original_index | |||
y_df.columns = y_t.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why is this necessary? What was the situation where the columns weren't the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predictions weren't getting the corresponding series ID values as the column names and that's needed since the decomposer uses this to select the correct value. Before this was causing the decomposer to return NaN values. @christopherbunn figured that out so he might have more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predictions that are generated do not have the series ID values as their column names. Copying these names over is required so we can inverse_transform from the decomposer.
evalml/pipelines/utils.py
Outdated
seasonal_period = STLDecomposer.determine_periodicity( | ||
X, | ||
y, | ||
rel_max_order=order, | ||
) | ||
if seasonal_period is not None and seasonal_period <= DECOMPOSER_PERIOD_CAP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way determine_periodicity
is set up, we're currently detecting a "period" on the single stacked target data column. I'm worried that that's too brittle, it could cause weird issues in the future. Could you put this in a conditional branch to ensure we only run it in the single series case, and for now just always add the decomposer for multiseries? We'll have to come back and revisit, but that should be ok for the MVP.
Resolves #4298
Acceptance Criteria (AC)