Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.24.2 #94

Merged
merged 59 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
77722cb
fix(bin.synthesize): correctly handle boolean arguments
eginhard May 30, 2024
29e91f2
fix(utils.generic_utils): correctly call now()
eginhard May 30, 2024
bdd44cf
docs: update readme
eginhard May 30, 2024
03de4b8
docs: fix readthedocs links
eginhard Jun 13, 2024
063e9e9
Merge pull request #38 from idiap/cli
eginhard Jun 14, 2024
e5c208d
feat(cleaners): add multilingual phoneme cleaner
eginhard Jun 14, 2024
a1495d4
fix(recipes): use multilingual phoneme cleaner in non-english recipes
eginhard Jun 14, 2024
9cfcc0a
chore(cleaners): add type hints
eginhard Jun 14, 2024
3a20f47
fix(freevc): use the specified device for pretrained speaker encoder …
ChristianRomberg Jun 16, 2024
4bc0e75
build: add numpy2 support
eginhard Jun 16, 2024
bd9b21d
Merge pull request #44 from idiap/phoneme-cleaners
eginhard Jun 17, 2024
81ac7ab
Merge pull request #47 from idiap/numpy2
eginhard Jun 17, 2024
4b6da4e
refactor(stream_generator): update special tokens for transformers>=4…
eginhard Jun 15, 2024
2a28123
refactor(stream_generator): update code for transformers>=4.41.1
eginhard Jun 15, 2024
4d9e18e
chore(stream_generator): address lint issues
eginhard Jun 15, 2024
98c0f86
Merge pull request #46 from idiap/fix-xtts-streaming
eginhard Jun 18, 2024
c9f7197
test(helpers): add test_ prefix so tests actually run
eginhard Jun 20, 2024
857cd55
test(helpers): fix test_rand_segment, test_generate_path
eginhard Jun 20, 2024
9f80e04
refactor(freevc): use existing layernorm
eginhard Jun 24, 2024
d65bcf6
chore(freevc): remove duplicate DDSConv and ElementwiseAffine
eginhard Jun 24, 2024
cd7b6da
fix: clarify types, fix missing functions
eginhard Jun 25, 2024
f8df19a
refactor: remove duplicate convert_pad_shape
eginhard Jun 20, 2024
a755328
refactor(freevc): remove duplicate sequence_mask
eginhard Jun 20, 2024
c5241d7
chore: address pytorch deprecations
eginhard Jun 25, 2024
c30fb0f
chore: remove duplicate init_weights
eginhard Jun 26, 2024
4bd3df2
refactor: remove duplicate get_padding
eginhard Jun 26, 2024
ff2cd5c
Merge pull request #49 from idiap/vc-refactors
eginhard Jun 26, 2024
59ef28d
build: move umap-learn into optional notebook dependencies
eginhard Jun 26, 2024
c693b08
build: update trainer to 0.1.4
eginhard Jun 27, 2024
28296c6
refactor: use get_git_branch from trainer
eginhard Jun 27, 2024
0fb26f9
refactor: use get_user_data_dir from trainer
eginhard Jun 27, 2024
da82d55
refactor: use load_fsspec from trainer
eginhard Jun 27, 2024
e869b9b
refactor: use load_checkpoint from trainer
eginhard Jun 27, 2024
2d06aeb
chore: remove unused TTS.utils.io module
eginhard Jun 27, 2024
808a938
build: specify minimum versions for dependencies
eginhard Jun 29, 2024
8cab2e3
ci: test lowest and highest compatible versions of dependencies
eginhard Jun 29, 2024
c1a929b
Merge pull request #51 from idiap/update-trainer
eginhard Jul 2, 2024
6ea3b75
Update xtts.py (#53)
abrahammathews2000 Jul 2, 2024
9192ef1
fix(xtts): load tokenizer file based on config as last resort
eginhard Jul 5, 2024
de35920
Merge pull request #50 from idiap/umap
eginhard Jul 25, 2024
20583a4
Merge pull request #57 from idiap/xtts-vocab
eginhard Jul 25, 2024
20bbb41
fix(xtts): update streaming for transformers>=4.42.0 (#59)
gravityrail Jul 25, 2024
8c460d0
fix(dataset): skip files where audio length can't be computed
eginhard Jul 31, 2024
9c604c1
chore(dataset): address lint issues
eginhard Jul 31, 2024
19fce2c
Merge pull request #66 from idiap/skip-broken-audio
eginhard Jul 31, 2024
d304ab2
build: update gruut version for numpy2 support
eginhard Jul 3, 2024
b1558b0
build: require numpy<2 because spacy/thinc lack support
eginhard Jul 3, 2024
7014782
build: add upper bound for transformers
eginhard Aug 5, 2024
204588f
Merge pull request #56 from idiap/update-gruut
eginhard Aug 5, 2024
233dfb5
docs(tacotron): fix wrong paper links (#74)
hykilpikonna Aug 25, 2024
1920328
feat(xtts): support hindi in tokenizer (#64)
eginhard Sep 12, 2024
865bc39
chore(bark): remove manual download of hubert model
eginhard Sep 12, 2024
242e278
ci: explicitly upload hidden files for coverage
eginhard Sep 12, 2024
6f8f15e
build: allow numpy2, which should be supported in spacy 3.8 now (#81)
eginhard Sep 13, 2024
1d39246
feat: normalize unicode characters in text cleaners (#85)
shavit Oct 2, 2024
4887a2e
fix(build): restrict spacy version to unbreak installation (#92)
KoljaB Oct 4, 2024
de22d24
build: restrict coqui trainer version
eginhard Oct 4, 2024
f667ee4
ci: switch to cibuildwheel
eginhard Sep 17, 2024
282b2da
chore: bump version to 0.24.2
eginhard Oct 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 14 additions & 36 deletions .github/workflows/pypi-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ defaults:
bash
jobs:
build-sdist:
name: Build source distribution
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -23,37 +24,31 @@ jobs:
with:
python-version: 3.9
- run: |
python -m pip install -U pip setuptools wheel build
python -m pip install -U pip setuptools build
- run: |
python -m build
- run: |
pip install dist/*.tar.gz
- uses: actions/upload-artifact@v4
with:
name: sdist
name: build-sdist
path: dist/*.tar.gz
build-wheels:
runs-on: ubuntu-latest
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
os: [ubuntu-latest, windows-latest, macos-latest]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install build requirements
run: |
python -m pip install -U pip setuptools wheel build numpy cython
- name: Setup and install manylinux1_x86_64 wheel
run: |
python setup.py bdist_wheel --plat-name=manylinux1_x86_64
python -m pip install dist/*-manylinux*.whl
- name: Build wheels
uses: pypa/cibuildwheel@v2.21.1
- uses: actions/upload-artifact@v4
with:
name: wheel-${{ matrix.python-version }}
path: dist/*-manylinux*.whl
name: build-wheels-${{ matrix.os }}
path: ./wheelhouse/*.whl
publish-artifacts:
name: Publish to PyPI
runs-on: ubuntu-latest
needs: [build-sdist, build-wheels]
environment:
Expand All @@ -62,28 +57,11 @@ jobs:
permissions:
id-token: write
steps:
- run: |
mkdir dist
- uses: actions/download-artifact@v4
with:
name: "sdist"
path: "dist/"
- uses: actions/download-artifact@v4
with:
name: "wheel-3.9"
path: "dist/"
- uses: actions/download-artifact@v4
with:
name: "wheel-3.10"
path: "dist/"
- uses: actions/download-artifact@v4
with:
name: "wheel-3.11"
path: "dist/"
- uses: actions/download-artifact@v4
with:
name: "wheel-3.12"
path: "dist/"
path: dist
pattern: build-*
merge-multiple: true
- run: |
ls -lh dist/
- name: Publish package distributions to PyPI
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,17 @@ jobs:
sed -i 's/https:\/\/coqui.gateway.scarf.sh\//https:\/\/github.com\/coqui-ai\/TTS\/releases\/download\//g' TTS/.models.json
- name: Install TTS
run: |
python3 -m uv pip install --system "coqui-tts[dev,server,languages] @ ."
python3 setup.py egg_info
resolution=highest
if [ "${{ matrix.python-version }}" == "3.9" ]; then
resolution=lowest-direct
fi
python3 -m uv pip install --resolution=$resolution --system "coqui-tts[dev,server,languages] @ ."
- name: Unit tests
run: make ${{ matrix.subset }}
- name: Upload coverage data
uses: actions/upload-artifact@v4
with:
include-hidden-files: true
name: coverage-data-${{ matrix.subset }}-${{ matrix.python-version }}
path: .coverage.*
if-no-files-found: ignore
Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
- 📣 ⓍTTSv2 is here with 16 languages and better performance across the board.
- 📣 ⓍTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech).
- 📣 ⓍTTS can now stream with <200ms latency.
- 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released [Blog Post](https://coqui.ai/blog/tts/open_xtts), [Demo](https://huggingface.co/spaces/coqui/xtts), [Docs](https://coqui-tts.readthedocs.io/en/dev/models/xtts.html)
- 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://coqui-tts.readthedocs.io/en/dev/models/bark.html)
- 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released [Blog Post](https://coqui.ai/blog/tts/open_xtts), [Demo](https://huggingface.co/spaces/coqui/xtts), [Docs](https://coqui-tts.readthedocs.io/en/latest/models/xtts.html)
- 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://coqui-tts.readthedocs.io/en/latest/models/bark.html)
- 📣 You can use [~1100 Fairseq models](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://coqui-tts.readthedocs.io/en/dev/models/tortoise.html)
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://coqui-tts.readthedocs.io/en/latest/models/tortoise.html)

<div align="center">
<img src="https://static.scarf.sh/a.png?x-pxid=cf317fe7-2188-4721-bc01-124bb5d5dbb2" />
Expand Down Expand Up @@ -55,6 +55,10 @@ Please use our dedicated channels for questions and discussion. Help is much mor
[discord]: https://discord.gg/5eXr5seRrv
[Tutorials and Examples]: https://github.com/coqui-ai/TTS/wiki/TTS-Notebooks-and-Tutorials

The [issues](https://github.com/coqui-ai/TTS/issues) and
[discussions](https://github.com/coqui-ai/TTS/discussions) in the original
repository are also still a useful source of information.


## 🔗 Links and Resources
| Type | Links |
Expand Down Expand Up @@ -143,6 +147,7 @@ If you plan to code or train models, clone 🐸TTS and install it locally.

```bash
git clone https://github.com/idiap/coqui-ai-TTS
cd coqui-ai-TTS
pip install -e .
```

Expand Down
1 change: 0 additions & 1 deletion TTS/.models.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@
"https://coqui.gateway.scarf.sh/hf/bark/fine_2.pt",
"https://coqui.gateway.scarf.sh/hf/bark/text_2.pt",
"https://coqui.gateway.scarf.sh/hf/bark/config.json",
"https://coqui.gateway.scarf.sh/hf/bark/hubert.pt",
"https://coqui.gateway.scarf.sh/hf/bark/tokenizer.pth"
],
"default_vocoder": null,
Expand Down
6 changes: 3 additions & 3 deletions TTS/bin/compute_attention_masks.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm
from trainer.io import load_checkpoint

from TTS.config import load_config
from TTS.tts.datasets.TTSDataset import TTSDataset
from TTS.tts.models import setup_model
from TTS.tts.utils.text.characters import make_symbols, phonemes, symbols
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import ConsoleFormatter, setup_logger
from TTS.utils.io import load_checkpoint

if __name__ == "__main__":
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())
Expand All @@ -35,7 +35,7 @@
--data_path /root/LJSpeech-1.1/
--batch_size 32
--dataset ljspeech
--use_cuda True
--use_cuda
""",
formatter_class=RawTextHelpFormatter,
)
Expand All @@ -62,7 +62,7 @@
help="Dataset metafile inclusing file paths with transcripts.",
)
parser.add_argument("--data_path", type=str, default="", help="Defines the data path. It overwrites config.json.")
parser.add_argument("--use_cuda", type=bool, default=False, help="enable/disable cuda.")
parser.add_argument("--use_cuda", action=argparse.BooleanOptionalAction, default=False, help="enable/disable cuda.")

parser.add_argument(
"--batch_size", default=16, type=int, help="Batch size for the model. Use batch_size=1 if you have no CUDA."
Expand Down
2 changes: 1 addition & 1 deletion TTS/bin/compute_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def compute_embeddings(
default=False,
action="store_true",
)
parser.add_argument("--disable_cuda", type=bool, help="Flag to disable cuda.", default=False)
parser.add_argument("--disable_cuda", action="store_true", help="Flag to disable cuda.", default=False)
parser.add_argument("--no_eval", help="Do not compute eval?. Default False", default=False, action="store_true")
parser.add_argument(
"--formatter_name",
Expand Down
4 changes: 2 additions & 2 deletions TTS/bin/eval_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ def compute_encoder_accuracy(dataset_items, encoder_manager):
type=str,
help="Path to dataset config file.",
)
parser.add_argument("--use_cuda", type=bool, help="flag to set cuda.", default=True)
parser.add_argument("--eval", type=bool, help="compute eval.", default=True)
parser.add_argument("--use_cuda", action=argparse.BooleanOptionalAction, help="flag to set cuda.", default=True)
parser.add_argument("--eval", action=argparse.BooleanOptionalAction, help="compute eval.", default=True)

args = parser.parse_args()

Expand Down
2 changes: 1 addition & 1 deletion TTS/bin/extract_tts_spectrograms.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ def main(args): # pylint: disable=redefined-outer-name
parser.add_argument("--debug", default=False, action="store_true", help="Save audio files for debug")
parser.add_argument("--save_audio", default=False, action="store_true", help="Save audio files")
parser.add_argument("--quantize_bits", type=int, default=0, help="Save quantized audio files if non-zero")
parser.add_argument("--eval", type=bool, help="compute eval.", default=True)
parser.add_argument("--eval", action=argparse.BooleanOptionalAction, help="compute eval.", default=True)
args = parser.parse_args()

c = load_config(args.config_path)
Expand Down
10 changes: 5 additions & 5 deletions TTS/bin/remove_silence_using_vad.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def preprocess_audios():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end True"
description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end"
)
parser.add_argument("-i", "--input_dir", type=str, help="Dataset root dir", required=True)
parser.add_argument("-o", "--output_dir", type=str, help="Output Dataset dir", default="")
Expand All @@ -95,20 +95,20 @@ def preprocess_audios():
parser.add_argument(
"-t",
"--trim_just_beginning_and_end",
type=bool,
action=argparse.BooleanOptionalAction,
default=True,
help="If True this script will trim just the beginning and end nonspeech parts. If False all nonspeech parts will be trim. Default True",
help="If True this script will trim just the beginning and end nonspeech parts. If False all nonspeech parts will be trimmed.",
)
parser.add_argument(
"-c",
"--use_cuda",
type=bool,
action=argparse.BooleanOptionalAction,
default=False,
help="If True use cuda",
)
parser.add_argument(
"--use_onnx",
type=bool,
action=argparse.BooleanOptionalAction,
default=False,
help="If True use onnx",
)
Expand Down
57 changes: 19 additions & 38 deletions TTS/bin/synthesize.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Command line interface."""

import argparse
import contextlib
Expand Down Expand Up @@ -136,30 +137,16 @@
"""


def str2bool(v):
if isinstance(v, bool):
return v
if v.lower() in ("yes", "true", "t", "y", "1"):
return True
if v.lower() in ("no", "false", "f", "n", "0"):
return False
raise argparse.ArgumentTypeError("Boolean value expected.")


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

def parse_args() -> argparse.Namespace:
"""Parse arguments."""
parser = argparse.ArgumentParser(
description=description.replace(" ```\n", ""),
formatter_class=RawTextHelpFormatter,
)

parser.add_argument(
"--list_models",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
help="list available pre-trained TTS and vocoder models.",
)

Expand Down Expand Up @@ -207,7 +194,7 @@ def main():
default="tts_output.wav",
help="Output wav file path.",
)
parser.add_argument("--use_cuda", type=bool, help="Run model on CUDA.", default=False)
parser.add_argument("--use_cuda", action="store_true", help="Run model on CUDA.")
parser.add_argument("--device", type=str, help="Device to run model on.", default="cpu")
parser.add_argument(
"--vocoder_path",
Expand All @@ -226,10 +213,7 @@ def main():
parser.add_argument(
"--pipe_out",
help="stdout the generated TTS wav file for shell pipe.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)

# args for multi-speaker synthesis
Expand Down Expand Up @@ -261,25 +245,18 @@ def main():
parser.add_argument(
"--list_speaker_idxs",
help="List available speaker ids for the defined multi-speaker model.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)
parser.add_argument(
"--list_language_idxs",
help="List available language ids for the defined multi-lingual model.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)
# aux args
parser.add_argument(
"--save_spectogram",
type=bool,
help="If true save raw spectogram for further (vocoder) processing in out_path.",
default=False,
action="store_true",
help="Save raw spectogram for further (vocoder) processing in out_path.",
)
parser.add_argument(
"--reference_wav",
Expand All @@ -295,8 +272,8 @@ def main():
)
parser.add_argument(
"--progress_bar",
type=str2bool,
help="If true shows a progress bar for the model download. Defaults to True",
action=argparse.BooleanOptionalAction,
help="Show a progress bar for the model download.",
default=True,
)

Expand Down Expand Up @@ -337,19 +314,23 @@ def main():
]
if not any(check_args):
parser.parse_args(["-h"])
return args


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())
args = parse_args()

pipe_out = sys.stdout if args.pipe_out else None

with contextlib.redirect_stdout(None if args.pipe_out else sys.stdout):
# Late-import to make things load faster
from TTS.api import TTS
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

# load model manager
path = Path(__file__).parent / "../.models.json"
manager = ModelManager(path, progress_bar=args.progress_bar)
api = TTS()

tts_path = None
tts_config_path = None
Expand Down
Loading
Loading