Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue training on OOM error && add subsampling support for trainValidationDatasetManager #6714

Merged
merged 7 commits into from
Jun 14, 2023

Conversation

LittleLittleCloud
Copy link
Contributor

@LittleLittleCloud LittleLittleCloud commented May 26, 2023

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

Training on a sub-set of train dataset will help mitigate OOM error.
This will be helpful if the training dataset is huge, in which case subsampling won't hurt metric a lot.
Because this feature is mostly useful for massive training dataset, crossValidationDatasetManger won't need it so the subsampling strategy is only added to trainValidationDatasetManager.

fix #dotnet/machinelearning-modelbuilder#2645

@LittleLittleCloud LittleLittleCloud changed the title Continue training on OOM error Continue training on OOM error && add subsampling support for trainValidationDatasetManager May 26, 2023
@LittleLittleCloud
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@@ -12,34 +15,83 @@ public interface IDatasetManager
{
}

internal interface ICrossValidateDatasetManager
public interface ICrossValidateDatasetManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation for these since they are now public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}

internal class CrossValidateDatasetManager : IDatasetManager, ICrossValidateDatasetManager
{
public IDataView Dataset { get; set; }
public IDataView? Dataset { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this to be a nullable type? Can an IDataView directly have null as a value that you can just check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be nullable. I'll make the change

@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Merging #6714 (eac6879) into main (0278f43) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6714      +/-   ##
==========================================
+ Coverage   68.84%   68.88%   +0.04%     
==========================================
  Files        1215     1216       +1     
  Lines      250691   250890     +199     
  Branches    26256    26258       +2     
==========================================
+ Hits       172580   172823     +243     
+ Misses      71286    71245      -41     
+ Partials     6825     6822       -3     
Flag Coverage Δ
Debug 68.88% <100.00%> (+0.04%) ⬆️
production 63.38% <100.00%> (+0.04%) ⬆️
test 88.90% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...t.ML.Fairlearn/AutoML/AutoMLExperimentExtension.cs 70.73% <100.00%> (+4.06%) ⬆️
...t.ML.Fairlearn/Reductions/GridSearchTrialRunner.cs 100.00% <100.00%> (ø)
...Microsoft.ML.AutoML.Tests/AutoMLExperimentTests.cs 84.32% <100.00%> (+0.39%) ⬆️
...L.AutoML.Tests/TrainValidaionDatasetManagerTest.cs 100.00% <100.00%> (ø)

... and 26 files with indirect coverage changes

@LittleLittleCloud LittleLittleCloud merged commit 22342df into main Jun 14, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants