feat: differentiable PyTorch backend #31

yoyololicon · 2022-02-14T04:55:44Z

Feature

An alternative and fully differentiable backend of norbert for PyTorch users. One extra dimension is required for input Tensors to make it be able to process in batch, which comes in handy for deep learning training. This implementation had been used to train Danna-Sep ¹². This change is backward compatible so it won't affect any existing projects that depend on norbert.

Usage

To use the PyTorch backend, users can simply just replace each call of norbert.* into norbert.torch.*.

Available Functions

expectation_maximization
wiener
softmask
wiener_gain
apply_filter
get_mix_model
get_local_gaussian_model
residual_model
reduce_interferences

Note

In order to make the new backend differentiable, some in-place operations in the original code were re-written so it's not a simple one-to-one translation.
BTW, I have some problems building the docs so I'm not sure what the documentation will be like after this PR. If anyone could help me check the docs I'll appreciate it.

Yu, Chin-Yun, and Kin-Wai Cheuk. "Danna-Sep: Unite to separate them all". https://arxiv.org/abs/2112.03752 ↩
Yuki et al. "Music Demixing Challenge 2021". https://doi.org/10.3389/frsip.2021.808395 ↩

faroit · 2022-02-14T07:58:25Z

@yoyololicon thats great 🙌 Have you seen our implementation in https://github.com/sigsep/open-unmix-pytorch/blob/master/openunmix/filtering.py ?

Is yours using complex or real dtypes?
can you benchmark against the one in openunmix?
did you make sure the numpy and torch outputs are identical (with a margin of error)? I don't see such a regression test in this PR

yoyololicon · 2022-02-14T08:15:52Z

@faroit

Have you seen our implementation in https://github.com/sigsep/open-unmix-pytorch/blob/master/openunmix/filtering.py ?

Thanks for the info, I haven't checked this one before.

Is yours using complex or real dtypes?

Did you mean using PyTorch native complex type? Yes.

can you benchmark against the one in openunmix?

Sure. I think we can also profile the memory usage as well.

did you make sure the numpy and torch outputs are identical (with a margin of error)? I don't see such a regression test in this PR

As I can remember, we haven't directly compared them before. I agree we should add a regression test and report the numbers.

aliutkus · 2022-02-15T05:20:30Z

Thanks a lot, that's great !! I remember talking to you about doing a PR for norbert, that's fantastic you took time to do it.

If you could indeed do the few tests @faroit mentions, that would be perfect
Does your implementation allow backprop even with a number of iterations > 0 ?

best

yoyololicon · 2022-02-15T06:31:47Z

Thanks a lot, that's great !! I remember talking to you about doing a PR for norbert, that's fantastic you took time to do it.

@aliutkus Yeah, I remember it, too. 😄

If you could indeed do the few tests @faroit mentions, that would be perfect Does your implementation allow backprop even with a number of iterations > 0 ?

Theoretically, it can backprop any number of iterations, but we always set it to one iteration in our training.
I'll add the tests later.

yoyololicon · 2022-02-17T04:40:30Z

@faroit @aliutkus I added the regression tests and should be ready for review.
See below for the benchmarks.

Profile

Norbert Torch

-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  aten::empty         0.42%      67.000us         0.42%      67.000us       4.467us      17.28 Mb      17.28 Mb            15  
                    aten::mul        33.89%       5.363ms        38.71%       6.126ms     510.500us      21.13 Mb      13.31 Mb            12  
          aten::empty_strided         0.49%      77.000us         0.49%      77.000us       5.133us      10.97 Mb      10.97 Mb            15  
                aten::resize_         0.21%      33.000us         0.21%      33.000us       8.250us      10.96 Mb      10.96 Mb             4  
                    aten::div         4.28%     678.000us         4.50%     713.000us     178.250us       5.54 Mb       5.53 Mb             4  
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 15.827ms

Open-Unmix

-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                    aten::mul        19.84%      29.532ms        20.01%      29.783ms     111.131us      76.34 Mb      73.60 Mb           268  
                    aten::add         4.59%       6.826ms         4.59%       6.826ms      36.116us      47.23 Mb      47.23 Mb           189  
                  aten::empty         0.23%     343.000us         0.24%     353.000us       2.942us      25.90 Mb      25.90 Mb           120  
          aten::empty_strided         0.31%     468.000us         0.31%     468.000us       5.032us      17.42 Mb      17.42 Mb            93  
                    aten::sub         1.87%       2.782ms         1.88%       2.798ms      43.046us      14.87 Mb      14.87 Mb            65  
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 148.867ms

Benchmark

Testing different combinations of sources, iterations, and threads, with other parameters fixed to a reasonable value.
It's always 2 to 3 times faster than openunmix.

[----------------- wiener -----------------]
              |  norbert torch  |  openunmix
1 threads: ---------------------------------
      [4, 0]  |        4.2      |      6.7  
      [4, 1]  |       46.8      |     80.6  
      [4, 3]  |      130.4      |    233.0  
      [8, 0]  |        4.0      |      8.2  
      [8, 1]  |       84.0      |    167.4  
      [8, 3]  |      232.6      |    470.5  
2 threads: ---------------------------------
      [4, 0]  |        2.2      |      3.7  
      [4, 1]  |       24.9      |     48.8  
      [4, 3]  |       70.5      |    138.0  
      [8, 0]  |        2.3      |      4.7  
      [8, 1]  |       46.9      |     97.8  
      [8, 3]  |      126.4      |    294.0  
4 threads: ---------------------------------
      [4, 0]  |        1.2      |      2.3  
      [4, 1]  |       16.0      |     38.3  
      [4, 3]  |       41.6      |    106.2  
      [8, 0]  |        1.9      |      3.5  
      [8, 1]  |       29.6      |     82.3  
      [8, 3]  |       80.6      |    226.7  

Times are in milliseconds (ms).

The script I used to produce those experiments

import torch
from torch.profiler import profile, record_function, ProfilerActivity
import torch.utils.benchmark as benchmark
from itertools import product

import norbert.torch as norbert
from openunmix import filtering


nb_frames = 100
nb_bins = 513
nb_channels = 2
nb_sources = [4, 8]
nb_iterations = [0, 1, 3]

x = torch.randn(nb_frames, nb_bins, nb_channels) + 1j * \
    torch.randn(nb_frames, nb_bins, nb_channels)
x_as_real = torch.view_as_real(x)
v = torch.rand(nb_frames, nb_bins, nb_channels, 4)

with profile(activities=[ProfilerActivity.CPU], profile_memory=True) as prof:
    norbert.wiener(v[None, ...], x[None, ...])
print("Norbert Torch")
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=5))

with profile(activities=[ProfilerActivity.CPU], profile_memory=True) as prof:
    filtering.wiener(v, x_as_real)
print("Open-Unmix")
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=5))


results = []
for ns, ni in product(nb_sources, nb_iterations):
    label = 'wiener'
    sub_label = f'[{ns}, {ni}]'
    v = torch.rand(nb_frames, nb_bins, nb_channels, ns)
    x = torch.randn(nb_frames, nb_bins, nb_channels) + 1j * \
        torch.randn(nb_frames, nb_bins, nb_channels)
    x_as_real = torch.view_as_real(x)

    for num_threads in [1, 2, 4]:
        results.append(benchmark.Timer(
            stmt='y = wiener(v, x, i)',
            setup='from norbert.torch import wiener',
            globals={'x': x.unsqueeze(0), 'v': v.unsqueeze(0), 'i': ni},
            label=label,
            num_threads=num_threads,
            sub_label=sub_label,
            description='norbert torch',
        ).blocked_autorange(min_run_time=1))

        results.append(benchmark.Timer(
            stmt='y = wiener(v, x, i)',
            setup='from openunmix.filtering import wiener',
            globals={'x': x_as_real, 'v': v, 'i': ni},
            label=label,
            num_threads=num_threads,
            sub_label=sub_label,
            description='openunmix',
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()

This reverts commit 64984d8.

faroit · 2022-03-04T15:13:20Z

@yoyololicon sorry for the slow response. I will have a look next week 🙏

turian · 2022-12-04T02:18:49Z

This looks great, @faroit would be amazing to be able to pip install this

faroit · 2022-12-04T06:59:58Z

@yoyololicon Totally forgot about this one. I will have a look again

yoyololicon · 2022-12-04T09:03:52Z

@faroit No problem. Take your time. Let's first enjoy the ISMIR conference 😆

yoyololicon added 9 commits February 13, 2022 11:41

refactor file structure

353b91a

feat: torch implementation

f6ed00a

feat: torch dependency

10ec5ed

add torch test cases

a822405

refactor: remove smooth in torch

db18aeb

refactor: remove comments

19d798a

feat: make import backward compatible

d601697

refactor init

514e0c6

doc: simplify docstrings

1c1edd0

yoyololicon marked this pull request as draft February 14, 2022 08:18

yoyololicon added 6 commits February 17, 2022 09:42

refactor: add missing imports

3a6c841

draft: regression tests

125f55e

complete all tests

5e82690

fix: finfo error

4780c05

fix: eps

3fc6e9d

refactor docstrings

5f2259d

yoyololicon marked this pull request as ready for review February 17, 2022 04:40

yoyololicon added 4 commits February 17, 2022 13:09

feat: add batch consistency test

5c1cdd6

feat: per samples normalization

64984d8

remove debug prints

94c16b1

Revert "feat: per samples normalization"

3ed9135

This reverts commit 64984d8.

yoyololicon mentioned this pull request Jan 28, 2023

torch backend yoyololicon/norbert#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: differentiable PyTorch backend #31

feat: differentiable PyTorch backend #31

yoyololicon commented Feb 14, 2022

faroit commented Feb 14, 2022

yoyololicon commented Feb 14, 2022

aliutkus commented Feb 15, 2022

yoyololicon commented Feb 15, 2022

yoyololicon commented Feb 17, 2022 •

edited

Loading

faroit commented Mar 4, 2022

turian commented Dec 4, 2022

faroit commented Dec 4, 2022

yoyololicon commented Dec 4, 2022

feat: differentiable PyTorch backend #31

Are you sure you want to change the base?

feat: differentiable PyTorch backend #31

Conversation

yoyololicon commented Feb 14, 2022

Feature

Usage

Available Functions

Note

Footnotes

faroit commented Feb 14, 2022

yoyololicon commented Feb 14, 2022

aliutkus commented Feb 15, 2022

yoyololicon commented Feb 15, 2022

yoyololicon commented Feb 17, 2022 • edited Loading

Profile

Norbert Torch

Open-Unmix

Benchmark

faroit commented Mar 4, 2022

turian commented Dec 4, 2022

faroit commented Dec 4, 2022

yoyololicon commented Dec 4, 2022

yoyololicon commented Feb 17, 2022 •

edited

Loading