[WIP] fix: updates the training sampling strategy to complete the last batch #538

wiitt · 2024-08-30T21:52:20Z

Fixes #438

PR Goal?

Updates sampling strategy in training and complete last batches with random samples from other batches instead of dropping last batches.

Fixes?

Fixes #438

Feedback sought?

If a model works with this new sampler. If it produces results that are better or at least not worse after training.

Priority?

Low

Tests added?

No tests added, but it would be good to have some testing of this sampler.

How to test?

Place a breakpoint and inspect the composition of a last batch in an epoch. Check that the number of batches correspond to expectations during training. Train a model in a scenario when difference between dropping and keeping last batches is noticeable (e.g. very small dataset or a dataset where samples in last batch have unique phonemes).

Confidence?

Low. This code wasn't properly tested.

Version change?

No. Can be a part of a larger update.

Related PRs?

No.

Fixes #438

semanticdiff-com · 2024-08-30T21:52:23Z

Review changes with SemanticDiff.

Analyzed 1 of 2 files.

Overall, the semantic diff is 60% smaller than the GitHub diff.

	Filename	Status
✔️	everyvoice/dataloader/__init__.py	59.3% smaller
❔	everyvoice/dataloader/oversampler.py	Unsupported file format

github-actions · 2024-08-30T21:55:33Z

CLI load time: 0:00.23
Pull Request HEAD: ad8cd7f4850f6c316605a546d155c1c0ec65eb98
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2024-08-30T21:55:42Z

Codecov Report

Attention: Patch coverage is 19.51220% with 33 lines in your changes missing coverage. Please review.

Project coverage is 75.60%. Comparing base (8fc4099) to head (ad8cd7f).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
everyvoice/dataloader/oversampler.py	20.00%	28 Missing ⚠️
everyvoice/dataloader/__init__.py	16.66%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   74.48%   75.60%   +1.12%     
==========================================
  Files          45       46       +1     
  Lines        3029     3283     +254     
  Branches      491      580      +89     
==========================================
+ Hits         2256     2482     +226     
- Misses        679      704      +25     
- Partials       94       97       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: updates the training sampling strategy to complete the last batch

ad8cd7f

Fixes #438

wiitt marked this pull request as draft August 30, 2024 21:52

wiitt requested a review from roedoejet August 30, 2024 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] fix: updates the training sampling strategy to complete the last batch #538

[WIP] fix: updates the training sampling strategy to complete the last batch #538

wiitt commented Aug 30, 2024

semanticdiff-com bot commented Aug 30, 2024 •

edited

Loading

github-actions bot commented Aug 30, 2024

codecov bot commented Aug 30, 2024

[WIP] fix: updates the training sampling strategy to complete the last batch #538

Are you sure you want to change the base?

[WIP] fix: updates the training sampling strategy to complete the last batch #538

Conversation

wiitt commented Aug 30, 2024

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs?

semanticdiff-com bot commented Aug 30, 2024 • edited Loading

github-actions bot commented Aug 30, 2024

codecov bot commented Aug 30, 2024

Codecov Report

semanticdiff-com bot commented Aug 30, 2024 •

edited

Loading