Update SentimentAnalysis to 1.0-preview #817

JRAlexander · 2019-04-15T22:02:57Z

Update SentimentAnalysis sample code to 1.0-preview.
Tutorial update - dotnet/docs#11816

luisquintanilla

JRAlexander · 2019-04-17T15:51:02Z

sfilipi · 2019-04-17T15:51:39Z

        // during the learning process.

this is a bit vague. Maybe: the object helping discover the ML.NET trainers and transforms. It is also useful to set the random seed and logging level.

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:25 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T15:51:57Z

        //Create ML Context with seed for repeatable/deterministic results

duplicate 'Create', and no space after the //

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:26 in e169e34. [](commit_id = e169e34, deletion_comment = False)

JRAlexander · 2019-04-17T15:57:43Z

Thanks, @sfilipi! Great comments!

sfilipi · 2019-04-17T18:33:39Z

machine-learning/tutorials/SentimentAnalysis/Program.cs


            // <SnippetSplitData>
-            TrainCatalogBase.TrainTestData splitDataView = mlContext.BinaryClassification.TrainTestSplit(dataView, testFraction: 0.2);
+            TrainTestData splitDataView = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);


testFraction: 0.2); [](start = 82, length = 19)

do you think it is necessary to comment about what this is doing?

yes! good catch.

sfilipi · 2019-04-17T18:36:14Z

        UseModelWithSingleItem(mlContext, model);

I'd call it PredictSingleItem. I think it is more indicative of what it is doing.

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:45 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T18:37:12Z

machine-learning/tutorials/SentimentAnalysis/Program.cs

+            // append the training algorithm to the estimator
            // <SnippetAddTrainer> 
-            .Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20));
+            .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"));


SdcaLogisticRegression [](start = 60, length = 22)

why not keep it to FastTree?

Because SdcaLogisticRegression gets better results.

sfilipi · 2019-04-17T18:55:47Z

machine-learning/tutorials/SentimentAnalysis/Program.cs

-            // The area under the ROC curve is equal to the probability that the classifier ranks
+            // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks
            // a randomly chosen positive instance higher than a randomly chosen negative one
            // (assuming 'positive' ranks higher than 'negative').


would it be simpler to say that the AreaUnderROC metric is an indicator of how confident the model is into correctly classifying the positive and negative classes as such.

if you think it should be more expanded, you could add: (or we can leave this to the metric description.)
If this metric is closer to 1, than most positive examples are correctly identified. If it is closer to 0.5 than the class prediction accuracy is 50%, equal to randomly selecting positive and negative, and if closer to 0 the predictions are reversed: positive classes are predicted as negative, and vice-versa.

Note: Let me double check that our ranges match the classical AROC ones. @Ivanidzo4ka if you have bandwidth to double-check.

sfilipi · 2019-04-17T19:23:39Z

machine-learning/tutorials/SentimentAnalysis/Program.cs

-            // The F1Score metric gets the classifier's F1 score.
+            // The F1Score metric gets the model's F1 score.
            // The F1 score is the harmonic mean of precision and recall:
            //  2 * precision * recall / (precision + recall).


would it be simpler to just say: F1 is a measure of tradeoff between precision and recall.

sfilipi · 2019-04-17T19:24:05Z

machine-learning/tutorials/SentimentAnalysis/Program.cs


-            // The Auc metric gets the area under the ROC curve.
-            // The area under the ROC curve is equal to the probability that the classifier ranks
+            // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks


AreaUnderROCCurve [](start = 19, length = 17)

Shall we keep it to the same name: AreaUnderRocCurve

I don't understand

It's the casing ROC -> Roc

sfilipi · 2019-04-17T19:42:52Z

        Console.WriteLine();

just curious, are those being shown to the user as they are? there's a lot of WriteLine() :)

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:173 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T19:43:30Z

        // Adds some comments to test the trained model's predictions.

data points

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:179 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T19:44:32Z

machine-learning/tutorials/SentimentAnalysis/Program.cs

            ITransformer loadedModel;
+            DataViewSchema dataViewSchema;
+
            using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))


using (var stream = [](start = 12, length = 19)

maybe add comment: load the model we saved previously

We are moving load and save to it's own how-to.

sfilipi · 2019-04-17T19:44:59Z

machine-learning/tutorials/SentimentAnalysis/Program.cs


            // <SnippetLoadModel>
            ITransformer loadedModel;
+            DataViewSchema dataViewSchema;


DataViewSchema dataViewSchema; [](start = 11, length = 31)

maybe add comment: a variable to store the schema of the model, generated during loading.

sfilipi · 2019-04-17T19:47:18Z

        // Load test data

maybe: convert the data points to generate predictions on, to an IDataView.
If we call it test data, the users might get confused with the actual test/validation data we use to generate the metrics .

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:204 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T19:47:56Z

        IDataView sentimentStreamingDataView = mlContext.Data.LoadFromEnumerable(sentiments);

I'd call it: newDataPoints or just newData

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:206 in e169e34. [](commit_id = e169e34, deletion_comment = False)

sfilipi · 2019-04-17T19:49:32Z

        IEnumerable<(SentimentData sentiment, SentimentPrediction prediction)> sentimentsAndPredictions = sentiments.Zip(predictedResults, (sentiment, prediction) => (sentiment, prediction));

why not extend SentimentData with the PredictedLabel column. All the columns of the data are already in the DataView; users shouldn't need to do this.

Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:224 in e169e34. [](commit_id = e169e34, deletion_comment = False)

natke · 2019-04-18T17:10:47Z

machine-learning/tutorials/SentimentAnalysis/Program.cs


-            // The Auc metric gets the area under the ROC curve.
-            // The area under the ROC curve is equal to the probability that the classifier ranks
+            // The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks


It's the casing ROC -> Roc

Update SentimentAnalysis to 1.0-preview

e169e34

JRAlexander self-assigned this Apr 15, 2019

JRAlexander requested review from CESARDELATORRE, luisquintanilla, mairaw, natke and sfilipi April 15, 2019 22:03

JRAlexander mentioned this pull request Apr 16, 2019

Tutorial doesn't work in 1.0RC dotnet/docs#11859

Closed

luisquintanilla approved these changes Apr 17, 2019

View reviewed changes

sfilipi reviewed Apr 17, 2019

View reviewed changes

Removed save/load

8fa4e56

sfilipi reviewed Apr 17, 2019

View reviewed changes

Revised based on feedback

88ac06f

jralexander and others added 2 commits April 17, 2019 13:35

Revised based on feedback

9bfd64b

Revised based on feedback.

663ff2d

natke approved these changes Apr 18, 2019

View reviewed changes

Changed "AUC" display to "Area Under Roc Curve"

e511c02

JRAlexander merged commit 1fcb13a into dotnet:master Apr 18, 2019

Update SentimentAnalysis to 1.0-preview #817

Update SentimentAnalysis to 1.0-preview #817

Uh oh!

Conversation

JRAlexander commented Apr 15, 2019

Uh oh!

luisquintanilla left a comment

Choose a reason for hiding this comment

Uh oh!

JRAlexander commented Apr 17, 2019

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

sfilipi commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JRAlexander commented Apr 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

sfilipi commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfilipi commented Apr 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfilipi commented Apr 17, 2019 •

edited

Loading

sfilipi commented Apr 17, 2019 •

edited

Loading