-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue training on OOM error && add subsampling support for trainValidationDatasetManager #6714
Conversation
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
@@ -12,34 +15,83 @@ public interface IDatasetManager | |||
{ | |||
} | |||
|
|||
internal interface ICrossValidateDatasetManager | |||
public interface ICrossValidateDatasetManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation for these since they are now public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
} | ||
|
||
internal class CrossValidateDatasetManager : IDatasetManager, ICrossValidateDatasetManager | ||
{ | ||
public IDataView Dataset { get; set; } | ||
public IDataView? Dataset { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this to be a nullable type? Can an IDataView directly have null as a value that you can just check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be nullable. I'll make the change
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6714 +/- ##
==========================================
+ Coverage 68.84% 68.88% +0.04%
==========================================
Files 1215 1216 +1
Lines 250691 250890 +199
Branches 26256 26258 +2
==========================================
+ Hits 172580 172823 +243
+ Misses 71286 71245 -41
+ Partials 6825 6822 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.Training on a sub-set of train dataset will help mitigate
OOM
error.This will be helpful if the training dataset is huge, in which case subsampling won't hurt metric a lot.
Because this feature is mostly useful for massive training dataset, crossValidationDatasetManger won't need it so the subsampling strategy is only added to trainValidationDatasetManager.
fix #dotnet/machinelearning-modelbuilder#2645