Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements from my branch #39

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Improvements from my branch #39

wants to merge 15 commits into from

Conversation

GregorR
Copy link
Contributor

@GregorR GregorR commented Sep 21, 2018

Pursuant to our email exchange a week ago, I've sieved out the important parts for merge here:

  • The NN model is modular, and a different model can either be selected from a built-in set or loaded from an external file.

  • Many new models are included, trained in a variety of situations.

  • Maximum attenuation is parameterizable. How useful this feature is is questionable :)

  • Sample rate is parameterizable. This only affects the mapping of FFT bins to bands, so it's quite general, though it does not affect the input rate, so samples are still expected 480 at a time.

  • A bug in the !SMOOTH_BANDS mode is fixed

Not included are my broken "fix" to interpolation, the rename, or my various changes to the auto* scripts.

@jmvalin
Copy link
Member

jmvalin commented Oct 2, 2018

OK, so I still haven't had time to do a full review of your patches, but I guess I can at least give you some feedback:

  1. Regarding the ability to have multiple models, I think the patches have a few issues. I think it can be a bit confusing for the user to have to pick a model. Also, from previous experience, having a global object like "const struct RNNModel model_orig" exposed in the API itself can cause all kinds of problems. I can't remember the details, only that I've been there before and decided it was best to avoid it. At least for now, my preference would be for a compile-time model selection. I think ultimately, it may be possible to have one model that works for everything. That being said, I like your modularization work as it makes it easier to experiment.

  2. I don't think the sample rate patch works, or at least it's not the right way to do it. The problem is that even if you change the band boundaries to match the new sampling rate, the time difference between windows would still change if you keep the window size the same and change the sampling rate. That would cause a mismatch with the training conditions of the model. The only way to support different rates would be to change the frame size but not the bands. For example, for 32 kHz, the frame size would have to be 320 (640-sample FFT). This would at least make it possible to support the important rates like 8, 16, and 32 kHz. I guess rates like 44.1 kHz could always be achieved by just pretending the audio is 48 kHz since a 10% difference should be small enough.

  3. As far as I can tell, the !SMOOTH_BANDS fix is wrong. Right now, the code won't work for !SMOOTH_BANDS because it would need a model trained for that (with fewer bands), but trying to get the code to run for !SMOOTH_BANDS will just give wrong results.

I'll try to have a closer look at the details of the other patches, but at least you have some comments already.

@jmvalin
Copy link
Member

jmvalin commented Oct 3, 2018

OK, I've thought again about the sampling rate change and even the solution I was proposing isn't quite perfect (though it may be good enough, I don't know) because of the pitch estimation/filtering. Of course, an easy solution is to just resample to 48 kHz at the input and then resample back to the original rate on output.

@GregorR
Copy link
Contributor Author

GregorR commented Apr 9, 2019

Apologies for the absurdly slow response here, I have other commitments (as of course do you!)

(1) I think the fundamental problem is that regardless of whether it's possible to create a "perfect" model, it will never be all-covering, because people disagree on what is noise. The reason I found the default one to be unsuitable is that what I was recording was myself playing an instrument in my apartment, and the things I consider to be noise in that environment are very different than the things I'd consider to be noise in a crowded coffeeshop. It's not like every possible element in-between is needed, but it's unlikely that a model could be trained in such a way that it worked for both, not because of any limitations in the technique, but because they are actually different problems, albeit related ones. Perhaps a compromise would be to have only the options of the default, or to load from a file, and to include—or not—other models only as files?

(2) Yeah, with your description, I can see that it's not right. We're too far out of my expertise. I agree that just demanding the right sample rate in the first place would be a better option.

(3) The current version doesn't compile :). Literally all this change was was to make it compile. Perhaps it should just be nixed if it's neither working nor going to be fixed.

@jmvalin
Copy link
Member

jmvalin commented Apr 12, 2019

(1) I agree that we may indeed need multiple models. I still think a default build (./configure; make) should only include one model. From there, there's many possible ways of adding models. It could be a compile option --with-model=XYZ, it could be loading a file, or both.

(2) Yeah, I think attempting to handle other rates is going to be complicated. If it's really a useful feature, then the best would be to just integrate a resampler and so it transparently.

(3) Yeah, the best is to just remove that option. I'm wasn't planning on bringing it back.

@GregorR
Copy link
Contributor Author

GregorR commented Apr 25, 2019

In an attempt to keep history clean, I recreated the branch with the irrelevant revisions stripped out, so apologies for the appearance of throwing a bunch of commits on at the same time. They're the same, with a couple bugs fixed, and the following changes:

(1) Sample rate adjustment is nixed.

(2) The "fix" to !SMOOTH_BANDS has been replaced with the elimination of that non-feature to clean up the code.

(3) Multiple models are supported exclusively through reading model files.

(4) The builtin model is used only by specifying NULL as the model; the "model_orig" variable is not public. (Note that this option was always available)

Thus, this PR now consists of the following changes:

(1) The RNN model is made modularizable and loadable from a file, with the existing model being loaded if none is otherwise provided.

(2) Warnings are fixed, headers made more modular, the broken !SMOOTH_BANDS implementation is removed, and other such minor cleanups.

(3) Maximum attenuation is made parameterizable. I retain that this feature itself is of arguable utility, but it more importantly brings with it a general configuration API that should scale to future changes.

@FrnchFrgg
Copy link

Note that in my case where the noise is very low (just cleaning up a voice over recording in a not so soundproof environment), I had to add a path without RRNoise and automate the mix because RRNoise was far too eager on my "fois" which means "times" in French. Roughly one over two sounded "...ois". It seems that the "fffff" sound is close enough to RRNoise's representation of noise that it gets removed.
So having the possibility to use other models might be useful here.

While doing that, I discovered that the phase behaviour of RRNoise is funky, so it cannot be mixed 50% wet. Using a maximum attenuation parameter that can be changed between every processes would enable having an automatable "dry/wet" control (with suitable smoothing of the control to avoid zipping of course, but that responsibility would rely on the wrapper DAW plugin like https://github.com/lucianodato/speech-denoiser/ that I use).

@jmvalin
Copy link
Member

jmvalin commented May 30, 2019

OK, I reviewed the entire stack and after some rebasing (reordering+fusing some patches), I merged all but three changes. The only remaining issue is related to the max attenuation. I'm OK with the general idea, but have two issues with the implementation:

  1. The code that handles it is really convoluted for apparently no good reason. As far as I can tell, you could get almost the same behaviour by just doing a gain = max(gain, attenuation_floor).
  2. Expressing the attenuation as a gain is really counter-intuitive. I think the attenuation should be specified as a (positive) value in dB. Ideally an int value would probably be best in case there's ever a fixed-point implementation of this lib.

So the three patches I didn't merge are

  • Added parameterizable maximum attenuation.
  • Let's scale our maximum attenuation correctly...
  • Added input of model file to the demo.
    (the last one because I can't apply it without the first one)

@petterreinholdtsen
Copy link
Contributor

So, should this be reworked and rebased to only include the remaining fixes or be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants