Skip to content

Commit

Permalink
change max_frames to max_duration in docs (#1419)
Browse files Browse the repository at this point in the history
* change max_frames to max_duration in docs

* minor fix
  • Loading branch information
pengzhendong authored Nov 23, 2024
1 parent 1880fc1 commit 9c1330a
Show file tree
Hide file tree
Showing 14 changed files with 26 additions and 26 deletions.
8 changes: 4 additions & 4 deletions docs/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ It allows for interesting collation methods - e.g. **padding the speech with noi

The items for mini-batch creation are selected by the ``Sampler``.
Lhotse defines ``Sampler`` classes that are initialized with :class:`~lhotse.cut.CutSet`'s, so that they can look up specific properties of an utterance to stratify the sampling.
For example, :class:`~lhotse.dataset.sampling.SimpleCutSampler` has a defined ``max_frames`` attribute, and it will keep sampling cuts for a batch until they do not exceed the specified number of frames.
For example, :class:`~lhotse.dataset.sampling.SimpleCutSampler` has a defined ``max_duration`` attribute, and it will keep sampling cuts for a batch until they do not exceed the specified number of seconds.
Another strategy — used in :class:`~lhotse.dataset.sampling.BucketingSampler` — will first group the cuts of similar durations into buckets, and then randomly select a bucket to draw the whole batch from.

For tasks where both input and output of the model are speech utterances, we can use the :class:`~lhotse.dataset.sampling.CutPairsSampler`, which accepts two :class:`~lhotse.cut.CutSet`'s and will match the cuts in them by their IDs.
Expand All @@ -38,11 +38,11 @@ A typical Lhotse's dataset API usage might look like this:
.. code-block::
from torch.utils.data import DataLoader
from lhotse.dataset import SpeechRecognitionDataset, SimpleCutSampler
from lhotse.dataset import K2SpeechRecognitionDataset, SimpleCutSampler
cuts = CutSet(...)
dset = SpeechRecognitionDataset(cuts)
sampler = SimpleCutSampler(cuts, max_frames=50000)
dset = K2SpeechRecognitionDataset(cuts)
sampler = SimpleCutSampler(cuts, max_duration=500)
# Dataset performs batching by itself, so we have to indicate that
# to the DataLoader with batch_size=None
dloader = DataLoader(dset, sampler=sampler, batch_size=None, num_workers=1)
Expand Down
2 changes: 1 addition & 1 deletion lhotse/cut/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -723,7 +723,7 @@ def pad(
"""
Return a new MixedCut, padded with zeros in the recording, and ``pad_feat_value`` in each feature bin.
The user can choose to pad either to a specific `duration`; a specific number of frames `max_frames`;
The user can choose to pad either to a specific `duration`; a specific number of frames `num_frames`;
or a specific number of samples `num_samples`. The three arguments are mutually exclusive.
:param duration: The cut's minimal duration after padding.
Expand Down
2 changes: 1 addition & 1 deletion lhotse/cut/mixed.py
Original file line number Diff line number Diff line change
Expand Up @@ -622,7 +622,7 @@ def pad(
"""
Return a new MixedCut, padded with zeros in the recording, and ``pad_feat_value`` in each feature bin.
The user can choose to pad either to a specific `duration`; a specific number of frames `max_frames`;
The user can choose to pad either to a specific `duration`; a specific number of frames `num_frames`;
or a specific number of samples `num_samples`. The three arguments are mutually exclusive.
:param duration: The cut's minimal duration after padding.
Expand Down
2 changes: 1 addition & 1 deletion lhotse/cut/padding.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def pad(
"""
Return a new MixedCut, padded with zeros in the recording, and ``pad_feat_value`` in each feature bin.
The user can choose to pad either to a specific `duration`; a specific number of frames `max_frames`;
The user can choose to pad either to a specific `duration`; a specific number of frames `num_frames`;
or a specific number of samples `num_samples`. The three arguments are mutually exclusive.
:param duration: The cut's minimal duration after padding.
Expand Down
2 changes: 1 addition & 1 deletion lhotse/cut/set.py
Original file line number Diff line number Diff line change
Expand Up @@ -2821,7 +2821,7 @@ def pad(
"""
Return a new MixedCut, padded with zeros in the recording, and ``pad_feat_value`` in each feature bin.
The user can choose to pad either to a specific `duration`; a specific number of frames `max_frames`;
The user can choose to pad either to a specific `duration`; a specific number of frames `num_frames`;
or a specific number of samples `num_samples`. The three arguments are mutually exclusive.
:param cut: DataCut to be padded.
Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/audio_tagging.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def __init__(
def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]]:
"""
Return a new batch, with the batch size automatically determined using the constraints
of max_frames and max_cuts.
of max_duration and max_cuts.
"""
self.hdf5_fix.update()

Expand Down
4 changes: 2 additions & 2 deletions lhotse/dataset/sampling/bucketing.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class BucketingSampler(CutSampler):
... # BucketingSampler specific args
... sampler_type=SimpleCutSampler, num_buckets=20,
... # Args passed into SimpleCutSampler
... max_frames=20000
... max_duration=200
... )
Bucketing sampler with 20 buckets, sampling pairs of source-target cuts::
Expand All @@ -40,7 +40,7 @@ class BucketingSampler(CutSampler):
... # BucketingSampler specific args
... sampler_type=CutPairsSampler, num_buckets=20,
... # Args passed into CutPairsSampler
... max_source_frames=20000, max_target_frames=15000
... max_source_duration=200, max_target_duration=150
... )
"""

Expand Down
8 changes: 4 additions & 4 deletions lhotse/dataset/sampling/cut_pairs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ class CutPairsSampler(CutSampler):
It expects that both CutSet's strictly consist of Cuts with corresponding IDs.
It behaves like an iterable that yields lists of strings (cut IDs).
When one of :attr:`max_frames`, :attr:`max_samples`, or :attr:`max_duration` is specified,
When one of :attr:`max_source_duration`, :attr:`max_target_duration`, or :attr:`max_cuts` is specified,
the batch size is dynamic.
Exactly zero or one of those constraints can be specified.
Padding required to collate the batch does not contribute to max frames/samples/duration.
Padding required to collate the batch does not contribute to max source_duration/target_duration.
"""

def __init__(
Expand Down Expand Up @@ -229,7 +229,7 @@ def _next_batch(self) -> Tuple[CutSet, CutSet]:
self.source_constraints.add(next_source_cut)
self.target_constraints.add(next_target_cut)

# Did we exceed the max_source_frames and max_cuts constraints?
# Did we exceed the max_source_duration and max_cuts constraints?
if (
not self.source_constraints.exceeded()
and not self.target_constraints.exceeded()
Expand All @@ -249,7 +249,7 @@ def _next_batch(self) -> Tuple[CutSet, CutSet]:
# and return the cut anyway.
warnings.warn(
"The first cut drawn in batch collection violates one of the max_... constraints"
"we'll return it anyway. Consider increasing max_source_frames/max_cuts/etc."
"we'll return it anyway. Consider increasing max_source_duration/max_cuts/etc."
)
source_cuts.append(next_source_cut)
target_cuts.append(next_target_cut)
Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/sampling/dynamic.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ def detuplify(
else next_cut_or_tpl
)

# Did we exceed the max_frames and max_cuts constraints?
# Did we exceed the max_duration and max_cuts constraints?
if self.constraint.close_to_exceeding():
# Yes. Finish sampling this batch.
if self.constraint.exceeded() and len(cuts) == 1:
Expand Down
12 changes: 6 additions & 6 deletions lhotse/dataset/sampling/simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ class SimpleCutSampler(CutSampler):
Samples cuts from a CutSet to satisfy the input constraints.
It behaves like an iterable that yields lists of strings (cut IDs).
When one of :attr:`max_frames`, :attr:`max_samples`, or :attr:`max_duration` is specified,
When one of :attr:`max_duration`, or :attr:`max_cuts` is specified,
the batch size is dynamic.
Exactly zero or one of those constraints can be specified.
Padding required to collate the batch does not contribute to max frames/samples/duration.
Padding required to collate the batch does not contribute to max duration.
Example usage::
Expand Down Expand Up @@ -197,10 +197,10 @@ def _next_batch(self) -> CutSet:
self.diagnostics.discard_single(next_cut)
continue

# Track the duration/frames/etc. constraints.
# Track the duration/etc. constraints.
self.time_constraint.add(next_cut)

# Did we exceed the max_frames and max_cuts constraints?
# Did we exceed the max_duration and max_cuts constraints?
if not self.time_constraint.exceeded():
# No - add the next cut to the batch, and keep trying.
cuts.append(next_cut)
Expand All @@ -215,9 +215,9 @@ def _next_batch(self) -> CutSet:
# and return the cut anyway.
warnings.warn(
"The first cut drawn in batch collection violates "
"the max_frames, max_cuts, or max_duration constraints - "
"the max_duration, or max_cuts constraints - "
"we'll return it anyway. "
"Consider increasing max_frames/max_cuts/max_duration."
"Consider increasing max_duration/max_cuts."
)
cuts.append(next_cut)

Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/sampling/weighted_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class WeightedSimpleCutSampler(SimpleCutSampler):
When performing sampling, it avoids having duplicated cuts in the same batch.
The sampler terminates if the number of sampled cuts reach :attr:`num_samples`
When one of :attr:`max_frames`, :attr:`max_samples`, or :attr:`max_duration` is specified,
When one of :attr:`max_duration`, or :attr:`max_cuts` is specified,
the batch size is dynamic.
Example usage:
Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/speech_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def __init__(
def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]]:
"""
Return a new batch, with the batch size automatically determined using the constraints
of max_frames and max_cuts.
of max_duration and max_cuts.
"""
validate_for_asr(cuts)

Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/speech_translation.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def __init__(
def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]]:
"""
Return a new batch, with the batch size automatically determined using the constraints
of max_frames and max_cuts.
of max_duration and max_cuts.
"""
validate_for_asr(cuts)
self.hdf5_fix.update()
Expand Down
2 changes: 1 addition & 1 deletion lhotse/dataset/surt.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ def __init__(
def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]]:
"""
Return a new batch, with the batch size automatically determined using the constraints
of max_frames and max_cuts.
of max_duration and max_cuts.
"""
validate_for_asr(cuts)

Expand Down

0 comments on commit 9c1330a

Please sign in to comment.