Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetSummaryDataView() implementation for Pca and Linear Predictors #185

Merged
merged 19 commits into from
Jun 8, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions src/Microsoft.ML.PCA/PcaTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,7 @@ public static CommonOutputs.AnomalyDetectionOutput TrainPcaAnomaly(IHostEnvironm
// REVIEW: move the predictor to a different file and fold EigenUtils.cs to this file.
public sealed class PcaPredictor : PredictorBase<Float>,
IValueMapper,
ICanGetSummaryAsIDataView,
ICanSaveInTextFormat, ICanSaveModel, ICanSaveSummary
{
public const string LoaderSignature = "pcaAnomExec";
Expand Down Expand Up @@ -469,6 +470,20 @@ public void SaveAsText(TextWriter writer, RoleMappedSchema schema)
}
}

public IDataView GetSummaryDataView(RoleMappedSchema schema)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSummaryDataView [](start = 25, length = 18)

Could you add a unit test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

{
var bldr = new ArrayDataViewBuilder(Host);

bldr.AddColumn("Mean vector", NumberType.R4, _mean);
Copy link
Contributor

@TomFinley TomFinley May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean vector [](start = 28, length = 11)

While not explicitly prescribed, elsewhere we have held to the convention that these names ought to be valid C# identifiers... which is to say, PascalCased. #Closed

bldr.AddColumn("Projected mean vector", NumberType.R4, _meanProjected);
for (var i = 0; i < _rank; ++i)
{
Copy link
Contributor

@TomFinley TomFinley May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single line statements like this, prefer this:

for (int i=0; I<foo; ++i)
    yodawg += i;

to this

for (int i=0; I<foo; ++i)
{
    yodawg += i;
}

Those extra {s are evil.

Edit: for one line statements, to be clear. I'm not a monster. #Closed

bldr.AddColumn("V" + i, NumberType.R4, _eigenVectors[i]);
Copy link

@yaeldekel yaeldekel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_eigenVectors [](start = 55, length = 13)

This should be added as a single vector column, each eigenvector in one row.

Copy link
Contributor

@TomFinley TomFinley May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Either this should be a single multidimensional vectror column (with _rank as the second dimension), or you should find a way to span this across multiple rows. (I prefer the first option myself, since honestly this ought to be part of metadata.)


In reply to: 189336699 [](ancestors = 189336699)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as a single vector column

}

return bldr.GetDataView();
}

public ColumnType InputType
{
get { return _inputType; }
Expand Down
15 changes: 15 additions & 0 deletions src/Microsoft.ML.StandardLearners/Standard/LinearPredictor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ public abstract class LinearPredictor : PredictorBase<Float>,
ICanSaveInTextFormat,
ICanSaveInSourceCode,
ICanSaveModel,
ICanGetSummaryAsIDataView,
Copy link

@yaeldekel yaeldekel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICanGetSummaryAsIDataView [](start = 8, length = 25)

Thanks Gani for making this change. Please take a look at these two classes inheriting from this class: LinearBinaryPredictor and LinearRegressionPredictor. They implement an interface called ICanGetSummaryAsIRow which this class should implement instead of ICanGetSummaryAsDataView. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, this is done

ICanSaveSummary,
IPredictorWithFeatureWeights<Float>,
IWhatTheFeatureValueMapper,
Expand Down Expand Up @@ -343,6 +344,20 @@ public void SaveAsCode(TextWriter writer, RoleMappedSchema schema)

public abstract void SaveSummary(TextWriter writer, RoleMappedSchema schema);

public IDataView GetSummaryDataView(RoleMappedSchema schema)
{
var bldr = new ArrayDataViewBuilder(Host);

ValueGetter<VBuffer<DvText>> getSlotNames =
(ref VBuffer<DvText> dst) =>
MetadataUtils.GetSlotNames(schema, RoleMappedSchema.ColumnRole.Feature, Weight.Count, ref dst);

// Add the bias and the weight columns.
bldr.AddColumn("Bias", NumberType.R4, Bias);
bldr.AddColumn("Weights", getSlotNames, NumberType.R4, Weight);
Copy link

@yaeldekel yaeldekel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight [](start = 67, length = 6)

This will try to create a scalar column with multiple rows, and will not work because the bias column has only one row. #Closed

return bldr.GetDataView();
}

public abstract void SaveAsIni(TextWriter writer, RoleMappedSchema schema, ICalibrator calibrator = null);

public virtual void GetFeatureWeights(ref VBuffer<Float> weights)
Expand Down