Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wav2Vec2] Improve SpecAugment function by converting numpy based fun… #10494

Closed
wants to merge 1 commit into from

Conversation

punitvara
Copy link

@punitvara punitvara commented Mar 3, 2021

…ction to pytorch based function

Implements #10459

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

@punitvara punitvara force-pushed the feature branch 2 times, most recently from 53a39ae to 0af1ec9 Compare March 3, 2021 06:07
mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
mask_idc = torch.randperm(sz - min_len)[:num_mask]
mask_idc = torch.from_numpy(
np.asarray([mask_idc[j] + offset for j in range(len(mask_idc)) for offset in range(lengths[j])])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to not use np here. If we use torch.from_numpy(...) - it's not GPU-friendly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask_idc = torch.tensor([mask_idc[j] + offset for j in range(len(mask_idc)) for offset in range(lengths[j])])

I just did this. Is this right ?


min_len = min([len(m) for m in mask_idcs])
min_len = torch.min(mask_idcs)
for i, mask_idc in enumerate(mask_idcs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to get rid of the for-loop and do tensor operations only

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to do this. Something like following but not getting how should I put it ? Can you help here.

mask[i, mask_idc] = [True, torch.randperm(mask_idc)[:min_len] if torch.tensor(mask_idcs).size() > min_len]

@patrickvonplaten
Copy link
Contributor

We need to run benchmark tests to see by how much the speed improved both on CPU and GPU

…ction to pytorch based function

Implements huggingface#10459

fixes some style changes
@punitvara
Copy link
Author

@patrickvonplaten Can you please help with above comments ?

@patrickvonplaten
Copy link
Contributor

Hey @punitvara,

At the moment, I sadly don't have the time to handle the big chunk of the PR. It would be great if you could try to:

  1. Find a way to benchmark your new function on GPU and show that it yields a speed-up in the forward pass compared to the old function

  2. Try out some advanced PyTorch indexing to replace the for loops.

Taking a look at those PRs should help you: #9600, #9453, #6064

@patrickvonplaten
Copy link
Contributor

Closing due to inactivity. Sorry @punitvara, I saw a lot of interest from other people to open a PR and this one seems to have stalled. Feel free to re-open it and give it a second shot if you want :-)

@punitvara
Copy link
Author

I got busy into some other work. I will try to work on different issue. If you get any PR, feel free to merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants