-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Update SentimentAnalysis to 1.0-preview #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
luisquintanilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @JRAlexander
|
Thanks, @luisquintanilla! |
this is a bit vague. Maybe: the object helping discover the ML.NET trainers and transforms. It is also useful to set the random seed and logging level. Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:25 in e169e34. [](commit_id = e169e34, deletion_comment = False) |
|
Thanks, @sfilipi! Great comments! |
|
|
||
| // <SnippetSplitData> | ||
| TrainCatalogBase.TrainTestData splitDataView = mlContext.BinaryClassification.TrainTestSplit(dataView, testFraction: 0.2); | ||
| TrainTestData splitDataView = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testFraction: 0.2); [](start = 82, length = 19)
do you think it is necessary to comment about what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! good catch.
| // append the training algorithm to the estimator | ||
| // <SnippetAddTrainer> | ||
| .Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20)); | ||
| .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SdcaLogisticRegression [](start = 60, length = 22)
why not keep it to FastTree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because SdcaLogisticRegression gets better results.
| // The area under the ROC curve is equal to the probability that the classifier ranks | ||
| // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks | ||
| // a randomly chosen positive instance higher than a randomly chosen negative one | ||
| // (assuming 'positive' ranks higher than 'negative'). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be simpler to say that the AreaUnderROC metric is an indicator of how confident the model is into correctly classifying the positive and negative classes as such.
if you think it should be more expanded, you could add: (or we can leave this to the metric description.)
If this metric is closer to 1, than most positive examples are correctly identified. If it is closer to 0.5 than the class prediction accuracy is 50%, equal to randomly selecting positive and negative, and if closer to 0 the predictions are reversed: positive classes are predicted as negative, and vice-versa.
Note: Let me double check that our ranges match the classical AROC ones. @Ivanidzo4ka if you have bandwidth to double-check.
| // The F1Score metric gets the classifier's F1 score. | ||
| // The F1Score metric gets the model's F1 score. | ||
| // The F1 score is the harmonic mean of precision and recall: | ||
| // 2 * precision * recall / (precision + recall). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be simpler to just say: F1 is a measure of tradeoff between precision and recall.
|
|
||
| // The Auc metric gets the area under the ROC curve. | ||
| // The area under the ROC curve is equal to the probability that the classifier ranks | ||
| // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AreaUnderROCCurve [](start = 19, length = 17)
Shall we keep it to the same name: AreaUnderRocCurve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the casing ROC -> Roc
| ITransformer loadedModel; | ||
| DataViewSchema dataViewSchema; | ||
|
|
||
| using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using (var stream = [](start = 12, length = 19)
maybe add comment: load the model we saved previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving load and save to it's own how-to.
|
|
||
| // <SnippetLoadModel> | ||
| ITransformer loadedModel; | ||
| DataViewSchema dataViewSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataViewSchema dataViewSchema; [](start = 11, length = 31)
maybe add comment: a variable to store the schema of the model, generated during loading.
maybe: convert the data points to generate predictions on, to an IDataView. Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:204 in e169e34. [](commit_id = e169e34, deletion_comment = False) |
why not extend SentimentData with the PredictedLabel column. All the columns of the data are already in the DataView; users shouldn't need to do this. Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:224 in e169e34. [](commit_id = e169e34, deletion_comment = False) |
|
|
||
| // The Auc metric gets the area under the ROC curve. | ||
| // The area under the ROC curve is equal to the probability that the classifier ranks | ||
| // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the casing ROC -> Roc
Update SentimentAnalysis sample code to 1.0-preview.
Tutorial update - dotnet/docs#11816