Skip to content

Conversation

@JRAlexander
Copy link
Contributor

Update SentimentAnalysis sample code to 1.0-preview.
Tutorial update - dotnet/docs#11816

Copy link
Contributor

@luisquintanilla luisquintanilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @JRAlexander

@JRAlexander
Copy link
Contributor Author

Thanks, @luisquintanilla!

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        // during the learning process.  

this is a bit vague. Maybe: the object helping discover the ML.NET trainers and transforms. It is also useful to set the random seed and logging level.


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:25 in e169e34. [](commit_id = e169e34, deletion_comment = False)

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        //Create ML Context with seed for repeatable/deterministic results

duplicate 'Create', and no space after the //


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:26 in e169e34. [](commit_id = e169e34, deletion_comment = False)

@JRAlexander
Copy link
Contributor Author

Thanks, @sfilipi! Great comments!


// <SnippetSplitData>
TrainCatalogBase.TrainTestData splitDataView = mlContext.BinaryClassification.TrainTestSplit(dataView, testFraction: 0.2);
TrainTestData splitDataView = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testFraction: 0.2); [](start = 82, length = 19)

do you think it is necessary to comment about what this is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! good catch.

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        UseModelWithSingleItem(mlContext, model);

I'd call it PredictSingleItem. I think it is more indicative of what it is doing.


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:45 in e169e34. [](commit_id = e169e34, deletion_comment = False)

// append the training algorithm to the estimator
// <SnippetAddTrainer>
.Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20));
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SdcaLogisticRegression [](start = 60, length = 22)

why not keep it to FastTree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SdcaLogisticRegression gets better results.

// The area under the ROC curve is equal to the probability that the classifier ranks
// The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks
// a randomly chosen positive instance higher than a randomly chosen negative one
// (assuming 'positive' ranks higher than 'negative').
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be simpler to say that the AreaUnderROC metric is an indicator of how confident the model is into correctly classifying the positive and negative classes as such.

if you think it should be more expanded, you could add: (or we can leave this to the metric description.)
If this metric is closer to 1, than most positive examples are correctly identified. If it is closer to 0.5 than the class prediction accuracy is 50%, equal to randomly selecting positive and negative, and if closer to 0 the predictions are reversed: positive classes are predicted as negative, and vice-versa.

Note: Let me double check that our ranges match the classical AROC ones. @Ivanidzo4ka if you have bandwidth to double-check.

// The F1Score metric gets the classifier's F1 score.
// The F1Score metric gets the model's F1 score.
// The F1 score is the harmonic mean of precision and recall:
// 2 * precision * recall / (precision + recall).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be simpler to just say: F1 is a measure of tradeoff between precision and recall.


// The Auc metric gets the area under the ROC curve.
// The area under the ROC curve is equal to the probability that the classifier ranks
// The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AreaUnderROCCurve [](start = 19, length = 17)

Shall we keep it to the same name: AreaUnderRocCurve

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the casing ROC -> Roc

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        Console.WriteLine();

just curious, are those being shown to the user as they are? there's a lot of WriteLine() :)


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:173 in e169e34. [](commit_id = e169e34, deletion_comment = False)

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        // Adds some comments to test the trained model's predictions.

data points


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:179 in e169e34. [](commit_id = e169e34, deletion_comment = False)

ITransformer loadedModel;
DataViewSchema dataViewSchema;

using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using (var stream = [](start = 12, length = 19)

maybe add comment: load the model we saved previously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving load and save to it's own how-to.


// <SnippetLoadModel>
ITransformer loadedModel;
DataViewSchema dataViewSchema;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataViewSchema dataViewSchema; [](start = 11, length = 31)

maybe add comment: a variable to store the schema of the model, generated during loading.

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        // Load test data  

maybe: convert the data points to generate predictions on, to an IDataView.
If we call it test data, the users might get confused with the actual test/validation data we use to generate the metrics .


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:204 in e169e34. [](commit_id = e169e34, deletion_comment = False)

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        IDataView sentimentStreamingDataView = mlContext.Data.LoadFromEnumerable(sentiments);

I'd call it: newDataPoints or just newData


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:206 in e169e34. [](commit_id = e169e34, deletion_comment = False)

@sfilipi
Copy link
Member

sfilipi commented Apr 17, 2019

        IEnumerable<(SentimentData sentiment, SentimentPrediction prediction)> sentimentsAndPredictions = sentiments.Zip(predictedResults, (sentiment, prediction) => (sentiment, prediction));

why not extend SentimentData with the PredictedLabel column. All the columns of the data are already in the DataView; users shouldn't need to do this.


Refers to: machine-learning/tutorials/SentimentAnalysis/Program.cs:224 in e169e34. [](commit_id = e169e34, deletion_comment = False)


// The Auc metric gets the area under the ROC curve.
// The area under the ROC curve is equal to the probability that the classifier ranks
// The AreaUnderROCCurve metric is equal to the probability that the algorithm ranks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the casing ROC -> Roc

@JRAlexander JRAlexander merged commit 1fcb13a into dotnet:master Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants