ENST Camille Voice Data

A French speech corpus designed for synthesis, recorded in 2013 at Télécom ParisTech (ENST) by Camille Dianoux, a female native speaker of French.

Data format

Audio

The audio data is provided in the losslessly compressed FLAC format, which can be played by a myriad of software, including Praat. The speaker was recorded at a 44.1 kHz sampling rate, 16 bits per sample, in mono. No filters of any sort have been applied to this raw data.

Phonetic segmentation

Annotations are provided as a single YAML file. It contains a list of utterances, each of which consists of

a prompt code (file basename),
the utterance text,
the recording date,
utterance start and end times (in seconds) in the FLAC file,
the phonetic segments (obtained using the eHMM tool from FestVox 2.1), each of which has
- a label (based on SAMPA, _ denotes silence), and
- its duration (in seconds)

For example,

- prompt: text_0366
  text: les alpinistes installent un bivouac au pied de la montagne .
  date: 2013-06-17T15:34:58Z
  start: 3185.5455782309
  end: 3191.814965986
  segments:
  - { lab: _, dur: 0.614969 }
  - { lab: l, dur: 0.08 }
  - { lab: e, dur: 0.135 }
  - { lab: a, dur: 0.09 }
  - { lab: l, dur: 0.045 }
  - { lab: p, dur: 0.085 }
  - { lab: i, dur: 0.075 }
  - { lab: n, dur: 0.06 }
  - { lab: i, dur: 0.105 }
  - { lab: s, dur: 0.095 }
  - { lab: t, dur: 0.065 }
  - { lab: '@', dur: 0.115 }
  # etc.

Extracting the corpus

Prerequisites

Java 8 (or later) and SoX must be installed.

Assembling the data

The data processing is delegated to Gradle and the FLAML plugin.

Run ./gradlew unpackData extractTextFiles extractLabFiles extractWavFiles to download and extract all data. See the FLAML plugin documentation for details.

Run ./gradlew assemble to prepare the data for distribution.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
gradle/wrapper		gradle/wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
.timestamp		.timestamp
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
build.gradle		build.gradle
enst-camille-data.yaml		enst-camille-data.yaml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ENST Camille Voice Data

Data format

Audio

Phonetic segmentation

Extracting the corpus

Prerequisites

Assembling the data

License

About

Releases 2

Packages

License

marytts/enst-camille-data

Folders and files

Latest commit

History

Repository files navigation

ENST Camille Voice Data

Data format

Audio

Phonetic segmentation

Extracting the corpus

Prerequisites

Assembling the data

License

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Packages