Making minimal sized `sdist` and `whl` using pyproject.toml #587

SamuelLarkin · 2024-11-15T20:30:48Z

PR Goal?

Turns out that our understanding of pyproject.toml with hatch was insufficient.
This PR aims to reduce what is included in a sdist and whl.
We decided to remove all tests/ because they would require us to include the wav files and more which bloats the whl and sdist.

Fixes?

fixes: #586

Feedback sought?

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

Priority?

Needed before our next release

Tests added?

None

How to test?

Get yourself some reference prebuild whl and sdist files.

wget 'https://files.pythonhosted.org/packages/1c/c4/99440963fb2562fcf019d8949400c7467a6c10a524515e81f054cbbcb32a/everyvoice-0.1.0a3-py3-none-any.whl'
wget 'https://files.pythonhosted.org/packages/14/a4/898c980f6ee34ce240efb9f6d3ffbf24ed755497dc52751953ca5a2228e4/everyvoice-0.1.0a3.tar.gz'

hatch build

Compare the content of each files with its reference.

sdist

vimdiff \
  +':windo %sort' \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||') \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d')

diff \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d' | sort) \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||' | sort)

1d0
<
19,25c18
< everyvoice.egg-info/dependency_links.txt
< everyvoice.egg-info/entry_points.txt
< everyvoice.egg-info/not-zip-safe
< everyvoice.egg-info/PKG-INFO
< everyvoice.egg-info/requires.txt
< everyvoice.egg-info/SOURCES.txt
< everyvoice.egg-info/top_level.txt
---
> everyvoice/evaluation.py
36a30
> everyvoice/model/aligner/DeepForcedAligner/.gitignore
42a37
> everyvoice/model/aligner/wav2vec2aligner/.gitignore
54d48
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
56d49
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
71,73d63
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
78a69
> everyvoice/model/feature_prediction/FastSpeech2_lightning/.gitignore
82a74
> everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/.gitignore
95,106d86
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
121a102
> everyvoice/wizard/simple_term_menu_win_stub.py
123a105
> .gitignore
125d106
< MANIFEST.in
126a108
> pyproject.toml
128,133d109
< requirements.dev.txt
< requirements.test.txt
< requirements.torch.txt
< requirements.txt
< setup.cfg
< setup.py

Wheel

vimdiff \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||') \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||')

diff \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort) \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)

diff  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort)  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)                                    3,9c3,8
< everyvoice-0.1.0a3.dist-info/entry_points.txt
< everyvoice-0.1.0a3.dist-info/LICENSE
< everyvoice-0.1.0a3.dist-info/METADATA
< everyvoice-0.1.0a3.dist-info/RECORD
< everyvoice-0.1.0a3.dist-info/top_level.txt
< everyvoice-0.1.0a3.dist-info/WHEEL
< everyvoice-0.1.0a3-py3-none-any.whl
---
> dist/everyvoice-0.2.0a0-py3-none-any.whl
> everyvoice-0.2.0a0.dist-info/entry_points.txt
> everyvoice-0.2.0a0.dist-info/licenses/LICENSE
> everyvoice-0.2.0a0.dist-info/METADATA
> everyvoice-0.2.0a0.dist-info/RECORD
> everyvoice-0.2.0a0.dist-info/WHEEL
26a26
> everyvoice/evaluation.py
55d54
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
57d55
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
72,74d69
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
96,107d90
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
122a106
> everyvoice/wizard/simple_term_menu_win_stub.py

Access to the unittest

After making a new environment with make-everyvoice-env, we can still perform python -m unittest everyvoice.tests.test_dataloader even if the tests/ directory wasn't included in the whl.

Confidence?

fair

Version change?

no

Related PRs?

None

semanticdiff-com · 2024-11-15T20:30:51Z

Review changes with

SamuelLarkin · 2024-11-15T20:33:20Z

pyproject.toml

+[tool.hatch.build.targets.sdist]
+include = ["requirements.torch.txt", "/everyvoice"]
+exclude = [
+  "*.coveragerc",
+  "*.ipynb",
+  "*.psv",
+  "*.pyc",
+  "*/.schema/*.json",
+  "*~",
+  ".git",
+  ".github",
+  ".gitlint",
+  ".pre-commit-config.yaml",
+  "LICENSE",
+  "README.md",
+  "pyproject.toml",
+  "readme.md",
+  "requirement*txt",
+  "setup.cfg",
+  "setup.py",
+  "tests/",
+  # ".gitignore",  # Needed/Used by hatch to filter-out files
+]
+
+[tool.hatch.build.targets.wheel]
+# sources = ["/everyvoice"]
+include = ["requirements.torch.txt", "/everyvoice"]
+exclude = [
+  "*.coveragerc",
+  "*.ipynb",
+  "*.psv",
+  "*.pyc",
+  "*/.schema/*.json",
+  "*~",
+  ".git",
+  ".github",
+  ".gitignore",
+  ".gitlint",
+  ".pre-commit-config.yaml",
+  "LICENSE",
+  "README.md",
+  "readme.md",
+  "requirement*txt",
+  "setup.cfg",
+  "setup.py",
+  "tests/",
+]
+


Should I fold these two under [tool.hatch.build] since there are almost identical?

Does that work? I didn't think that was possible, but if there is a way to list all this just once, yes, absolutely!

codecov · 2024-11-15T20:34:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.58%. Comparing base (dbc2282) to head (333bd2c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #587   +/-   ##
=======================================
  Coverage   76.58%   76.58%           
=======================================
  Files          46       46           
  Lines        3451     3451           
  Branches      470      470           
=======================================
  Hits         2643     2643           
  Misses        706      706           
  Partials      102      102

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-11-15T20:34:50Z

CLI load time: 0:00.29
Pull Request HEAD: 333bd2c05cff0af0b715f2863ace4d06051354fe
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:       260 |     106994 |   typer
import time:      7235 |     223298 | everyvoice.cli

joanise · 2024-11-18T17:19:35Z

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

For ReadAlongs/Studio, the equivalent code (scroll to pyproject.toml lines 94-97) is much shorter, so I was surprised you had to have all these exclusions here, but EV has all these files deeper, so we probably can't remove this list.

You can shorten it a tiny bit by finding patterns that express the sames things in a shorter way, but that's of little value,

One detail: include: "requirements.torch.txt" is negated by exclude: "requirement*txt", but I don't see any value in including this file in the wheel or tar ball, we only use when installing from the sandbox.

TLDR: the changes I suggest are:

factor out the two lists into tool.hatch.build since they can be made identical easily enough,
remove "requirements.torch.txt" from include.

SamuelLarkin added 2 commits November 15, 2024 15:29

feat: using hatch in CI to build our sdist and wheel

634d087

feat: making cleaner sdist & wheel

333bd2c

SamuelLarkin commented Nov 15, 2024

View reviewed changes

SamuelLarkin changed the title ~~DON'T LOOK AT THIS YET~~ Making minimal sized sdist and whl using pyproject.toml Nov 15, 2024

SamuelLarkin requested a review from joanise November 15, 2024 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making minimal sized `sdist` and `whl` using pyproject.toml #587

Making minimal sized `sdist` and `whl` using pyproject.toml #587

SamuelLarkin commented Nov 15, 2024 •

edited

Loading

semanticdiff-com bot commented Nov 15, 2024

SamuelLarkin Nov 15, 2024

joanise Nov 18, 2024

codecov bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

joanise commented Nov 18, 2024

Making minimal sized sdist and whl using pyproject.toml #587

Are you sure you want to change the base?

Making minimal sized sdist and whl using pyproject.toml #587

Conversation

SamuelLarkin commented Nov 15, 2024 • edited Loading

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

sdist

Wheel

Access to the unittest

Confidence?

Version change?

Related PRs?

semanticdiff-com bot commented Nov 15, 2024

SamuelLarkin Nov 15, 2024

Choose a reason for hiding this comment

joanise Nov 18, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 15, 2024

Codecov Report

github-actions bot commented Nov 15, 2024

joanise commented Nov 18, 2024

Making minimal sized `sdist` and `whl` using pyproject.toml #587

Making minimal sized `sdist` and `whl` using pyproject.toml #587

SamuelLarkin commented Nov 15, 2024 •

edited

Loading