Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making minimal sized sdist and whl using pyproject.toml #587

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SamuelLarkin
Copy link
Collaborator

@SamuelLarkin SamuelLarkin commented Nov 15, 2024

PR Goal?

Turns out that our understanding of pyproject.toml with hatch was insufficient.
This PR aims to reduce what is included in a sdist and whl.
We decided to remove all tests/ because they would require us to include the wav files and more which bloats the whl and sdist.

Fixes?

fixes: #586

Feedback sought?

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

Priority?

Needed before our next release

Tests added?

None

How to test?

Get yourself some reference prebuild whl and sdist files.

wget 'https://files.pythonhosted.org/packages/1c/c4/99440963fb2562fcf019d8949400c7467a6c10a524515e81f054cbbcb32a/everyvoice-0.1.0a3-py3-none-any.whl'
wget 'https://files.pythonhosted.org/packages/14/a4/898c980f6ee34ce240efb9f6d3ffbf24ed755497dc52751953ca5a2228e4/everyvoice-0.1.0a3.tar.gz'
hatch build

Compare the content of each files with its reference.

sdist

vimdiff \
  +':windo %sort' \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||') \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d')
diff \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d' | sort) \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||' | sort)
1d0
<
19,25c18
< everyvoice.egg-info/dependency_links.txt
< everyvoice.egg-info/entry_points.txt
< everyvoice.egg-info/not-zip-safe
< everyvoice.egg-info/PKG-INFO
< everyvoice.egg-info/requires.txt
< everyvoice.egg-info/SOURCES.txt
< everyvoice.egg-info/top_level.txt
---
> everyvoice/evaluation.py
36a30
> everyvoice/model/aligner/DeepForcedAligner/.gitignore
42a37
> everyvoice/model/aligner/wav2vec2aligner/.gitignore
54d48
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
56d49
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
71,73d63
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
78a69
> everyvoice/model/feature_prediction/FastSpeech2_lightning/.gitignore
82a74
> everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/.gitignore
95,106d86
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
121a102
> everyvoice/wizard/simple_term_menu_win_stub.py
123a105
> .gitignore
125d106
< MANIFEST.in
126a108
> pyproject.toml
128,133d109
< requirements.dev.txt
< requirements.test.txt
< requirements.torch.txt
< requirements.txt
< setup.cfg
< setup.py

Wheel

vimdiff \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||') \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||')
diff \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort) \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)
diff  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort)  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)                                    3,9c3,8
< everyvoice-0.1.0a3.dist-info/entry_points.txt
< everyvoice-0.1.0a3.dist-info/LICENSE
< everyvoice-0.1.0a3.dist-info/METADATA
< everyvoice-0.1.0a3.dist-info/RECORD
< everyvoice-0.1.0a3.dist-info/top_level.txt
< everyvoice-0.1.0a3.dist-info/WHEEL
< everyvoice-0.1.0a3-py3-none-any.whl
---
> dist/everyvoice-0.2.0a0-py3-none-any.whl
> everyvoice-0.2.0a0.dist-info/entry_points.txt
> everyvoice-0.2.0a0.dist-info/licenses/LICENSE
> everyvoice-0.2.0a0.dist-info/METADATA
> everyvoice-0.2.0a0.dist-info/RECORD
> everyvoice-0.2.0a0.dist-info/WHEEL
26a26
> everyvoice/evaluation.py
55d54
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
57d55
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
72,74d69
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
96,107d90
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
122a106
> everyvoice/wizard/simple_term_menu_win_stub.py

Access to the unittest

After making a new environment with make-everyvoice-env, we can still perform python -m unittest everyvoice.tests.test_dataloader even if the tests/ directory wasn't included in the whl.

Confidence?

fair

Version change?

no

Related PRs?

None

Copy link

Review changes with  SemanticDiff

Comment on lines +121 to +168
[tool.hatch.build.targets.sdist]
include = ["requirements.torch.txt", "/everyvoice"]
exclude = [
"*.coveragerc",
"*.ipynb",
"*.psv",
"*.pyc",
"*/.schema/*.json",
"*~",
".git",
".github",
".gitlint",
".pre-commit-config.yaml",
"LICENSE",
"README.md",
"pyproject.toml",
"readme.md",
"requirement*txt",
"setup.cfg",
"setup.py",
"tests/",
# ".gitignore", # Needed/Used by hatch to filter-out files
]

[tool.hatch.build.targets.wheel]
# sources = ["/everyvoice"]
include = ["requirements.torch.txt", "/everyvoice"]
exclude = [
"*.coveragerc",
"*.ipynb",
"*.psv",
"*.pyc",
"*/.schema/*.json",
"*~",
".git",
".github",
".gitignore",
".gitlint",
".pre-commit-config.yaml",
"LICENSE",
"README.md",
"readme.md",
"requirement*txt",
"setup.cfg",
"setup.py",
"tests/",
]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I fold these two under [tool.hatch.build] since there are almost identical?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that work? I didn't think that was possible, but if there is a way to list all this just once, yes, absolutely!

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.58%. Comparing base (dbc2282) to head (333bd2c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #587   +/-   ##
=======================================
  Coverage   76.58%   76.58%           
=======================================
  Files          46       46           
  Lines        3451     3451           
  Branches      470      470           
=======================================
  Hits         2643     2643           
  Misses        706      706           
  Partials      102      102           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

CLI load time: 0:00.29
Pull Request HEAD: 333bd2c05cff0af0b715f2863ace4d06051354fe
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:       260 |     106994 |   typer
import time:      7235 |     223298 | everyvoice.cli

@SamuelLarkin SamuelLarkin changed the title DON'T LOOK AT THIS YET Making minimal sized sdist and whl using pyproject.toml Nov 15, 2024
@joanise
Copy link
Member

joanise commented Nov 18, 2024

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

For ReadAlongs/Studio, the equivalent code (scroll to pyproject.toml lines 94-97) is much shorter, so I was surprised you had to have all these exclusions here, but EV has all these files deeper, so we probably can't remove this list.

You can shorten it a tiny bit by finding patterns that express the sames things in a shorter way, but that's of little value,

One detail: include: "requirements.torch.txt" is negated by exclude: "requirement*txt", but I don't see any value in including this file in the wheel or tar ball, we only use when installing from the sandbox.

TLDR: the changes I suggest are:

  • factor out the two lists into tool.hatch.build since they can be made identical easily enough,
  • remove "requirements.torch.txt" from include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the pyproject.toml build now includes too many files
2 participants