Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data classification&Value predication: Training failed when choosing "Split or File" validate strategy, "grid" tuner and enable "Subsample option". #2734

Closed
v-Hailishi opened this issue Jun 27, 2023 · 2 comments
Assignees
Labels
Priority:0 Work that we can't release without Reported by: Test Stale
Milestone

Comments

@v-Hailishi
Copy link

System Information (please complete the following information):
Windows OS: Windows-11-Enterprise-22H2
ML.Net Model Builder 2022: 17.17.0.2332602 (Main Build)
Microsoft Visual Studio Enterprise: 2022(17.5.5)
.Net: 6.0, 7.0

Describe the bug

  • On which step of the process did you run into an issue:
    Training failed when choosing "Split or File" validate strategy, "grid" tuner and enable "Subsample option".

TestMatrix
https://testpass.blob.core.windows.net/test-pass-data/wikipedia-detox-250-line-data.tsv
https://testpass.blob.core.windows.net/test-pass-data/taxi-fare.csv

To Reproduce
Steps to reproduce the behavior:

  1. Select Create a new project from the Visual Studio start window.
  2. Choose the C# Console App (.NET Core) project template.
  3. Add model builder by right click on the project.
  4. Select Data classification or Value predication scenario.
  5. On Data page, choose the "Tabular File" or SQL server data source.
  6. On the Train page, click the "Advanced training options..." link.
  7. Go to "Tuners" tab, only choose "grid" tuner, go to "Sample strategy" tab, enable "Subsample option" and save it.
  8. The training would be failed with error "Must be greater than zero.".

Expected behavior
Training successfully.

Screenshot
image
image

Additional context

@v-Hailishi v-Hailishi added Priority:0 Work that we can't release without Reported by: Test labels Jun 27, 2023
@LittleLittleCloud LittleLittleCloud self-assigned this Jun 27, 2023
@LittleLittleCloud LittleLittleCloud added this to the July 2023 milestone Jun 27, 2023
@LittleLittleCloud
Copy link
Contributor

This is because when enabling subsampling, a fraction will be added to search space which will be used to pull a part from the entire training dataset. The error is thrown because in a small dataset fraction * row count of dataset might be less than 0.

I'll fix this on ML.Net side. In the meantime, the work-around is to avoid using grid search + subsample at the same time

@v-Hailishi
Copy link
Author

The bug has been fixed on the latest main build 17.17.0.2361101.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:0 Work that we can't release without Reported by: Test Stale
Projects
None yet
Development

No branches or pull requests

2 participants