You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lhotse CutSampler classes now return mini-batch CutSets instead of a list of string cut IDs (Lhotse Dataset classes are adjusted correspondingly) (Overhaul of Dataset API #345)
lhotse install-sph2pipe for handling LDC corpora compressed with shorten (auto-registers sph2pipe so no further actions are needed) (Utilities to install sph2pipe #370)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Breaking changes
CutSampler
classes now return mini-batchCutSet
s instead of a list of string cut IDs (Lhotse Dataset classes are adjusted correspondingly) (Overhaul of Dataset API #345)Cut
is now an abstract base class for all cut types; what was previously calledCut
is now calledMonoCut
) (Refactor and document the Cut hierarchy (Cut -> MonoCut, CutUtilsBase -> Cut) #328)lhotse obtain
is nowlhotse download
(Adding GigaSpeech recipe CLI, minor GigaSpeech fixes, renaming "lhotse obtain" to "lhotse download" #329)Corpora
New features
CutSampler improvements (PyTorch data API)
ZipSampler
for batches constructed from different cut sources (ZipSampler for batches constructed from different cut sources #344 Fix docstring for ZipSampler #347 Fix test involving ZipSampler #363 thanks for fixes @janvainer)drop_last
option andget_report()
method for cut samplers (Samplers:drop_last
option andget_report()
diagnostic method #357)find_pessimistic_batches
utility to help fail fast with GPU OOM (Sampling:find_pessimistic_batches
utility to help fail fast with GPU OOM. #358)BucketingSampler
(Add a bucketing method with equal cumulative bucket duration #365)BucketingSampler
(Approximate proportional sampling in BucketingSampler; remaining_duration, remaining_cuts, num_cuts properties for samplers. #372)I/O improvements
OnTheFlyFeatures
(padding audio instead of features) (Padding features instead of audio in OnTheFlyFeatures #352)ChunkedLilcomHdf5Writer
(and reader) for efficient chunk reads of lilcom-compressed arrays (Chunked HDF5 feature storage + minor recording fixes + adjust GigaSpeech recipe #334)Data augmentation
Others
CutSet.trim_to_supervision
has new arguments for including actual acoustic context next to the supervisions (Improving and fixing trim_to_supervisions, adding more documentation, fixes to filter_supervisions #330 Adding trim_to_supervisions to Cut; new options to include acoustic context when using trim_to_supervisions #331)SupervisionSegment
is now mutable (and all Lhotse manifests will remain mutable) (Mutable supervisions #333).shuffle()
method for Lhotse*Set
classes (Adding .shuffle() method for *Set classes #341)lhotse fix
CLI (CLI:lhotse fix
#360)lhotse install-sph2pipe
for handling LDC corpora compressed withshorten
(auto-registers sph2pipe so no further actions are needed) (Utilities to install sph2pipe #370)General improvements
CutSet.subset(cut_ids=...)
(Fix cut ordering in CutSet.subset(cut_ids=...) #353)lhotse.dataset.sampling
file into a directory module (Splitting sampling.py to multiple files #366)This discussion was created from the release Thin Ribbon of Snow.
Beta Was this translation helpful? Give feedback.
All reactions