specaug speedup #6347

1-800-BAD-CODE · 2023-04-02T21:42:07Z

What does this PR do ?

Faster implementation of non-numba specaug

Collection: ASR

Changelog

Rather than repeatedly modify the features tensor in-place, build a mask and fill the features tensor once. By my measurements, about 20x faster on GPU.

It could be faster still, but it gets hard to compare output exactly to the original implementation to verify correctness.

Also, the original implementation seems to be biased away from the upper freq bins, but again fixing it would be make it impossible to compare the outputs to the original implementation.

Usage

Running this snippet will run both the original and this implementation, verify similar output, and print latencies.

Test snippet

import random
import time
import torch

from nemo.collections.asr.parts.submodules.spectr_augment import SpecAugment
from nemo.core.classes import typecheck


# Copy of original `SpecAugment.forward` for comparison
@typecheck()
@torch.no_grad()
def forward_original(self, input_spec, length):
    sh = input_spec.shape

    for idx in range(sh[0]):
        for i in range(self.freq_masks):
            x_left = self._rng.randint(0, sh[1] - self.freq_width)

            w = self._rng.randint(0, self.freq_width)

            input_spec[idx, x_left : x_left + w, :] = self.mask_value

        for i in range(self.time_masks):
            if self.adaptive_temporal_width:
                time_width = max(1, int(length[idx] * self.time_width))
            else:
                time_width = self.time_width

            y_left = self._rng.randint(0, max(1, length[idx] - time_width))

            w = self._rng.randint(0, time_width)

            input_spec[idx, :, y_left : y_left + w] = self.mask_value

    return input_spec


seed = 12345
batch_size = 128
feat_dim = 80
max_length = 1000
device = "cuda"
num_iterations = 100

# Generate some inputs
spec = torch.randn(size=[batch_size, feat_dim, max_length], device=device)
lengths = torch.randint(low=max_length // 10, high=max_length, size=[batch_size], device=device)
# Usually, at least one element in a batch is the max length.
lengths[0] = max_length
print(f"Testing with input features shape {spec.shape}")

# This version has the new forward
augmentor: SpecAugment = SpecAugment(
    freq_masks=2, time_masks=10, freq_width=27, time_width=0.05, rng=random.Random(seed)
).to(device)
# cache the first output for comparison (need to be careful of RNG values)
first_output_new = augmentor(input_spec=spec, length=lengths)

# Warm up
augmented_spec = augmentor(input_spec=spec, length=lengths)
# Loop and record time
start = time.time()
for _ in range(num_iterations):
    augmented_spec = augmentor(input_spec=spec, length=lengths)
    if device == "cuda":
        torch.cuda.synchronize()
stop = time.time()
mean_duration = (stop - start) / num_iterations
print(f"Mean duration for new implementation: {mean_duration * 1000:0.1f} ms")

# Re-instantiate a spec aug with same RNG
augmentor: SpecAugment = SpecAugment(
    freq_masks=2, time_masks=10, freq_width=27, time_width=0.05, rng=random.Random(seed)
).to(device)
# Set class's forward method to original one
setattr(SpecAugment, "forward", forward_original)
# Get baseline output. Note that the original implementation modifies in-place and returns a reference to `spec`, so
# use a clone of `spec` since we'll call forward some more for getting times
first_output_old = augmentor(input_spec=spec.clone(), length=lengths)

# Warm up
augmented_spec = augmentor(input_spec=spec, length=lengths)
# Loop and record time
start = time.time()
for _ in range(num_iterations):
    augmented_spec = augmentor(input_spec=spec, length=lengths)
    if device == "cuda":
        torch.cuda.synchronize()
stop = time.time()
mean_duration = (stop - start) / num_iterations
print(f"Mean duration for original specaug: {mean_duration * 1000:0.1f} ms")

# Compare the output of each module. Can use hard tolerance since op is a constant fill.
print("Asserting old and new augmentations are all close...")
torch.testing.assert_close(actual=first_output_new, expected=first_output_old, rtol=0.0, atol=0.0)
print("Old and new outputs match")

Expected output:

# With GPU
Testing with input features shape torch.Size([128, 80, 1000])
Mean duration for new implementation: 5.0 ms
Mean duration for original specaug: 100.1 ms
Asserting old and new augmentations are all close...
Old and new outputs match

# with CPU
Testing with input features shape torch.Size([128, 80, 1000])
Mean duration for new implementation: 8.5 ms
Mean duration for original specaug: 42.5 ms
Asserting old and new augmentations are all close...
Old and new outputs match

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

titu1994

Very cool pr ! The speedup pushes this on par with the Numba method (Numba is maybe a few milliseconds faster but it doesn't matter for CPU use and the speedup is substantial for GPU anymore)

Pr looks good, but I'll ask @VahidooX for final review

titu1994 · 2023-04-02T22:19:17Z

nemo/collections/asr/parts/submodules/spectr_augment.py

@@ -44,15 +45,15 @@ def input_types(self):
        """Returns definitions of module input types
        """
        return {
-            "input_spec": NeuralType(('B', 'D', 'T'), SpectrogramType()),


Could you revert these changes ?

Done... this was blacks default (string normalization) that was done automatically

nemo/collections/asr/parts/submodules/spectr_augment.py

titu1994 · 2023-04-02T22:23:07Z

nemo/collections/asr/parts/submodules/spectr_augment.py

@@ -120,13 +126,13 @@ class SpecCutout(nn.Module, Typing):
    def input_types(self):
        """Returns definitions of module input types
        """
-        return {"input_spec": NeuralType(('B', 'D', 'T'), SpectrogramType())}
+        return {"input_spec": NeuralType(("B", "D", "T"), SpectrogramType())}


Revert below.

titu1994

Thanks !

VahidooX

LGTM, thanks for the contribution!

titu1994 · 2023-04-04T17:24:45Z

Seems like you need to sign your commits - https://github.com/NVIDIA/NeMo/pull/6347/checks?check_run_id=12461951009

…VIDIA#6346) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: shane carroll <shane.carroll@utsa.edu>

Signed-off-by: shane carroll <shane.carroll@utsa.edu>

for more information, see https://pre-commit.ci Signed-off-by: shane carroll <shane.carroll@utsa.edu>

raman-r-4978 · 2023-04-05T10:35:55Z

Just a suggestion, please have a look at torchaudio.functional.mask_along_axis_iid which I hope doing the same thing

titu1994 · 2023-04-05T18:14:30Z

Torchaudio is an optional dependency in NeMo, and not installed automatically due to its dependency on fixed pytorch version and compilation. This is a more generic PR that has no requirements other than torch.

titu1994 · 2023-04-06T03:04:33Z

@1-800-BAD-CODE thanks for the PR !

* [Core] return_config=True now extracts just config, not full tarfile (NVIDIA#6346) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: shane carroll <shane.carroll@utsa.edu> * specaug speedup Signed-off-by: shane carroll <shane.carroll@utsa.edu> * comments Signed-off-by: shane carroll <shane.carroll@utsa.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: shane carroll <shane.carroll@utsa.edu> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: shane carroll <shane.carroll@utsa.edu> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>

github-actions bot added the ASR label Apr 2, 2023

titu1994 reviewed Apr 2, 2023

View reviewed changes

VahidooX previously approved these changes Apr 4, 2023

View reviewed changes

titu1994 and others added 4 commits April 4, 2023 19:17

[Core] return_config=True now extracts just config, not full tarfile (N…

b01fc88

…VIDIA#6346) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: shane carroll <shane.carroll@utsa.edu>

specaug speedup

4eeeb28

Signed-off-by: shane carroll <shane.carroll@utsa.edu>

comments

6d380f8

Signed-off-by: shane carroll <shane.carroll@utsa.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c44deba

for more information, see https://pre-commit.ci Signed-off-by: shane carroll <shane.carroll@utsa.edu>

1-800-BAD-CODE dismissed VahidooX’s stale review via c44deba April 4, 2023 23:18

1-800-BAD-CODE force-pushed the specaug_speedup branch from 3e0e0ac to c44deba Compare April 4, 2023 23:18

github-actions bot added core Changes to NeMo Core NLP labels Apr 4, 2023

Merge branch 'main' into specaug_speedup

ceeea1e

github-actions bot removed core Changes to NeMo Core NLP labels Apr 5, 2023

titu1994 approved these changes Apr 5, 2023

View reviewed changes

titu1994 merged commit 8aec729 into NVIDIA:main Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specaug speedup #6347

specaug speedup #6347

1-800-BAD-CODE commented Apr 2, 2023

titu1994 left a comment

titu1994 Apr 2, 2023

1-800-BAD-CODE Apr 2, 2023

titu1994 Apr 2, 2023

1-800-BAD-CODE Apr 2, 2023

titu1994 left a comment

VahidooX left a comment

titu1994 commented Apr 4, 2023

raman-r-4978 commented Apr 5, 2023

titu1994 commented Apr 5, 2023

titu1994 commented Apr 6, 2023

specaug speedup #6347

specaug speedup #6347

Conversation

1-800-BAD-CODE commented Apr 2, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Apr 2, 2023

Choose a reason for hiding this comment

1-800-BAD-CODE Apr 2, 2023

Choose a reason for hiding this comment

titu1994 Apr 2, 2023

Choose a reason for hiding this comment

1-800-BAD-CODE Apr 2, 2023

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

titu1994 commented Apr 4, 2023

raman-r-4978 commented Apr 5, 2023

titu1994 commented Apr 5, 2023

titu1994 commented Apr 6, 2023