Releases
v0.8
Breaking changes
Lhotse CutSampler
classes now return mini-batch CutSet
s instead of a list of string cut IDs (Lhotse Dataset classes are adjusted correspondingly) (#345 )
Cut refactoring (Cut
is now an abstract base class for all cut types; what was previously called Cut
is now called MonoCut
) (#328 )
CLI: lhotse obtain
is now lhotse download
(#329 )
Corpora
New features
CutSampler improvements (PyTorch data API)
ZipSampler
for batches constructed from different cut sources (#344 #347 #363 thanks for fixes @janvainer )
drop_last
option and get_report()
method for cut samplers (#357 )
find_pessimistic_batches
utility to help fail fast with GPU OOM (#358 )
streaming variant of shuffling for lazy CutSets in samplers (#359 )
a bucketing method with equal cumulative bucket duration for BucketingSampler
(#365 )
approximate proportional sampling in BucketingSampler
(#372 )
I/O improvements
chunked OPUS file reads (#339 )
chunked sphere file reads (#367 thanks @videodanchik )
faster OnTheFlyFeatures
(padding audio instead of features) (#352 )
ChunkedLilcomHdf5Writer
(and reader) for efficient chunk reads of lilcom-compressed arrays (#334 )
a global cache for re-using smart_open connection sessions (improves performance for repeated smart_open calls e.g., to S3) (#335 , thanks @oplatek )
Data augmentation
Others
CutSet.trim_to_supervision
has new arguments for including actual acoustic context next to the supervisions (#330 #331 )
SupervisionSegment
is now mutable (and all Lhotse manifests will remain mutable) (#333 )
.shuffle()
method for Lhotse *Set
classes (#341 )
lhotse fix
CLI (#360 )
lhotse install-sph2pipe
for handling LDC corpora compressed with shorten
(auto-registers sph2pipe so no further actions are needed) (#370 )
General improvements
refreshed docs (#327 #328 #330 )
improvements to downloading corpora (#340 )
experimental dataloader that allows two levels of parallelism (#343 , might be abandoned for other alternatives)
auto-detection of compatible torchaudio version for pytorch (#348 )
improvements to Kaldi data dir import/export (#351 #354 )
fixed cut ordering in CutSet.subset(cut_ids=...)
(#353 )
improvements to storing cuts as recordings (#355 )
refactored lhotse.dataset.sampling
file into a directory module (#366 )
improvements to CLI (#369 #371 thanks @songmeixu )
improvements to setup (#377 #383 thanks @songmeixu )
Colab notebook with ESPnet + Lhotse example (#384 )
improvements to Lhotse versioning (#385 )
You can’t perform that action at this time.