Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add threshold for RCA #5218

Merged
merged 37 commits into from
Jun 19, 2020
Merged

add threshold for RCA #5218

merged 37 commits into from
Jun 19, 2020

Conversation

suxi-ms
Copy link
Member

@suxi-ms suxi-ms commented Jun 9, 2020

Update Root Cause Analysis interface.

@suxi-ms suxi-ms requested a review from a team as a code owner June 9, 2020 09:57
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[LocalizeRootCause](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/LocalizeRootCause.cs)]
/// ]]>
/// </format>
/// </example>
public static RootCause LocalizeRootCause(this AnomalyDetectionCatalog catalog, RootCauseLocalizationInput src, double beta = 0.5)
public static RootCause LocalizeRootCause(this AnomalyDetectionCatalog catalog, RootCauseLocalizationInput src, double beta = 0.5, double rootCauseThreshold = 0.95)
Copy link
Contributor

@lisahua lisahua Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: another option as suggested by Harish (for AD) is to create an Options class to avoid multiple overload in future. #Resolved

Copy link
Member Author

@suxi-ms suxi-ms Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: another option as suggested by Harish (for AD) is to create an Options class to avoid multiple overload in future.

I found that the option is always served for transformer implementation. Does it makes sense to use it in the extentions catalog? #Resolved

Copy link
Contributor

@lisahua lisahua Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ML.Net team treats it at a pattern of handling multiple overloaded methods thus not confined to which type of class it can be used. yet it's just a minor comments so feel free to ignore #Resolved

Copy link
Contributor

@harishsk harishsk Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four parameters to a function is not too bad. If you are expecting to enhance this API further in the future by adding more parameters, I would suggest adding an Options parameter now. If there are no plans to add new parameters now, please add an Options parameter the next time there is a new parameter added.

E.g. We have now migrated rootCauseThreshold from being a private constant in RootCauseAnaylyzer to being a parameter here. But I also see _anomalyRatioThreshold and _anomalyPreDeltaThreshold as private constants there?

Does it make sense to convert them to parameters? If so, please create an Options class and add all those parameters there.


In reply to: 438493061 [](ancestors = 438493061)

Copy link
Member Author

@suxi-ms suxi-ms Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four parameters to a function is not too bad. If you are expecting to enhance this API further in the future by adding more parameters, I would suggest adding an Options parameter now. If there are no plans to add new parameters now, please add an Options parameter the next time there is a new parameter added.

E.g. We have now migrated rootCauseThreshold from being a private constant in RootCauseAnaylyzer to being a parameter here. But I also see _anomalyRatioThreshold and _anomalyPreDeltaThreshold as private constants there?

Does it make sense to convert them to parameters? If so, please create an Options class and add all those parameters there.

In reply to: 438493061 [](ancestors = 438493061)

Thanks for the suggestion, have updated #Resolved

Copy link
Contributor

@harishsk harishsk Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't think I communicated well.

The pattern in ML.NET is to have two overloaded functions. A convenience function and a full function. The convenience function takes in a minimal set of parameters that will meet the common use case and use default values for all other parameters. We always have this function. The second function is if a particular feature requires a function with lots of parameters. In this case we add a second overloaded function that takes an Options parameter where the user can set each parameter individually.

So at the very least you should have one function without an Options parameter. You add a second function with an Options parameter only if you are going to need lots of parameters.

Like I mentioned above, if all you need are four parameters, a single function is okay. Only if you were going to migrate the other constants from RootCauseAnalyzer, please add a second function with an Options parameter.


In reply to: 442571173 [](ancestors = 442571173)

@@ -421,7 +422,7 @@ private TimeSeriesPoint GetPointByDimension(Dictionary<string, TimeSeriesPoint>

private static string GetDicCode(Dictionary<string, Object> dic)
{
Copy link
Contributor

@lisahua lisahua Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an ideal way to create a key? or probably more efficient (and accurate) to maintain a tuple dictionary for <key, value> pair and invoke object.equal methods? #Resolved

Copy link
Contributor

@lisahua lisahua Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for this piece, does it assume that the value is additive? what if the values are not additive? Sample1 and Sample2, when we set the AggregateType.Unknown, is this piece still applicable?

                    p.Value = +leave.Value;
                    p.ExpectedValue = +leave.ExpectedValue;	                  
                    p.Delta = +leave.Delta;
``` #Resolved

Copy link
Member Author

@suxi-ms suxi-ms Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for this piece, does it assume that the value is additive? what if the values are not additive? Sample1 and Sample2, when we set the AggregateType.Unknown, is this piece still applicable?

                    p.Value = +leave.Value;
                    p.ExpectedValue = +leave.ExpectedValue;	                  
                    p.Delta = +leave.Delta;

Currently, for the unknown type, as the calculation might be complex, we use a default method like above to treat it.
#Resolved

Copy link
Member Author

@suxi-ms suxi-ms Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an ideal way to create a key? or probably more efficient (and accurate) to maintain a tuple dictionary for <key, value> pair and invoke object.equal methods?

It is hard to implement it as the object is very complex. #Resolved

Copy link
Contributor

@lisahua lisahua Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... will user just override the equal/Hashcode methods and ML.Net just needs to do object.EqualTo(..)? #Resolved

Copy link
Member Author

@suxi-ms suxi-ms Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... will user just override the equal/Hashcode methods and ML.Net just needs to do object.EqualTo(..)?

Thanks for the reminder, have changed #Resolved

Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to change libmf, right? Please take change from submodules(when git clone or git pull, add --recurse-submodules option as well)

@harishsk
Copy link
Contributor

harishsk commented Jun 10, 2020

To clarify @frank-dong-ms's comment above, you are seeing test failures above because your rebase didn't include an update to the submodule. Recently there was an update to the libmf submodule that fixed test failures. Please pull and rebase with --recurse-submodules and update your branch and these test failures should go away. #Resolved

@codecov
Copy link

codecov bot commented Jun 12, 2020

Codecov Report

Merging #5218 into master will decrease coverage by 3.92%.
The diff coverage is 38.35%.

@@            Coverage Diff             @@
##           master    #5218      +/-   ##
==========================================
- Coverage   73.31%   69.38%   -3.93%     
==========================================
  Files        1007      770     -237     
  Lines      187969   144849   -43120     
  Branches    20241    18435    -1806     
==========================================
- Hits       137812   100507   -37305     
+ Misses      44630    39195    -5435     
+ Partials     5527     5147     -380     
Flag Coverage Δ
#Debug 69.38% <38.35%> (-3.93%) ⬇️
#production 69.38% <38.35%> (+0.31%) ⬆️
#test ?
Impacted Files Coverage Δ
...crosoft.ML.TimeSeries/RootCauseLocalizationType.cs 49.42% <0.00%> (-1.77%) ⬇️
src/Microsoft.ML.TimeSeries/RootCauseAnalyzer.cs 54.20% <40.29%> (-2.25%) ⬇️
src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs 92.68% <50.00%> (-2.32%) ⬇️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
src/Microsoft.ML.Featurizers/Common.cs 57.14% <0.00%> (-7.62%) ⬇️
src/Microsoft.ML.AutoML/Utils/BestResultUtil.cs 53.84% <0.00%> (-3.69%) ⬇️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 59.60% <0.00%> (-2.65%) ⬇️
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 84.71% <0.00%> (-2.19%) ⬇️
...or/CodeGenerator/CSharp/TrainerGeneratorFactory.cs 81.25% <0.00%> (-2.09%) ⬇️
...rc/Microsoft.ML.Featurizers/DateTimeTransformer.cs 87.15% <0.00%> (-1.81%) ⬇️
... and 334 more

@suxi-ms
Copy link
Member Author

suxi-ms commented Jun 12, 2020

You don't need to change libmf, right? Please take change from submodules(when git clone or git pull, add --recurse-submodules option as well)

Thanks for the reminder, updated #Resolved

@suxi-ms
Copy link
Member Author

suxi-ms commented Jun 12, 2020

To clarify @frank-dong-ms's comment above, you are seeing test failures above because your rebase didn't include an update to the submodule. Recently there was an update to the libmf submodule that fixed test failures. Please pull and rebase with --recurse-submodules and update your branch and these test failures should go away.

Thanks for the reminder, updated #Resolved

@@ -187,14 +187,15 @@ public static class TimeSeriesCatalog
/// It is used when score is calculated for each root cause item. The range of beta should be in [0,1].
/// For a larger beta, root cause items which have a large difference between value and expected value will get a high score.
/// For a small beta, root cause items which have a high relative change will get a low score.</param>
/// <param name="rootCauseThreshold">A threshold to determine whether the point should be root cause. If the point's delta is equal to or larger than rootCauseThreshold multiplied by anomaly dimension point's delta, this point is treated as a root cause. Different threshold will turn out different results. Users can choose the delta according to their data and requirments. </param>
/// <example>
Copy link
Contributor

@harishsk harishsk Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the range of valid values for this parameter? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the range of valid values for this parameter?

Have updated the range

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@harishsk harishsk merged commit 1c2469f into dotnet:master Jun 19, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a threshold parameter for root cause analysis
5 participants