-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements from my branch #39
base: master
Are you sure you want to change the base?
Conversation
OK, so I still haven't had time to do a full review of your patches, but I guess I can at least give you some feedback:
I'll try to have a closer look at the details of the other patches, but at least you have some comments already. |
OK, I've thought again about the sampling rate change and even the solution I was proposing isn't quite perfect (though it may be good enough, I don't know) because of the pitch estimation/filtering. Of course, an easy solution is to just resample to 48 kHz at the input and then resample back to the original rate on output. |
Apologies for the absurdly slow response here, I have other commitments (as of course do you!) (1) I think the fundamental problem is that regardless of whether it's possible to create a "perfect" model, it will never be all-covering, because people disagree on what is noise. The reason I found the default one to be unsuitable is that what I was recording was myself playing an instrument in my apartment, and the things I consider to be noise in that environment are very different than the things I'd consider to be noise in a crowded coffeeshop. It's not like every possible element in-between is needed, but it's unlikely that a model could be trained in such a way that it worked for both, not because of any limitations in the technique, but because they are actually different problems, albeit related ones. Perhaps a compromise would be to have only the options of the default, or to load from a file, and to include—or not—other models only as files? (2) Yeah, with your description, I can see that it's not right. We're too far out of my expertise. I agree that just demanding the right sample rate in the first place would be a better option. (3) The current version doesn't compile :). Literally all this change was was to make it compile. Perhaps it should just be nixed if it's neither working nor going to be fixed. |
(1) I agree that we may indeed need multiple models. I still think a default build (./configure; make) should only include one model. From there, there's many possible ways of adding models. It could be a compile option --with-model=XYZ, it could be loading a file, or both. (2) Yeah, I think attempting to handle other rates is going to be complicated. If it's really a useful feature, then the best would be to just integrate a resampler and so it transparently. (3) Yeah, the best is to just remove that option. I'm wasn't planning on bringing it back. |
In an attempt to avoid namespace collision, rename model_orig as rnnoise_model_orig. (Note that the previous implementation exported things like vad_data, so this at least only as one, prefixed, unnecessary public variable)
Extending the neural network dumper to dump to a simple text file format, and adding reader functions to read a neural network description from a FILE *.
In an attempt to keep history clean, I recreated the branch with the irrelevant revisions stripped out, so apologies for the appearance of throwing a bunch of commits on at the same time. They're the same, with a couple bugs fixed, and the following changes: (1) Sample rate adjustment is nixed. (2) The "fix" to !SMOOTH_BANDS has been replaced with the elimination of that non-feature to clean up the code. (3) Multiple models are supported exclusively through reading model files. (4) The builtin model is used only by specifying NULL as the model; the "model_orig" variable is not public. (Note that this option was always available) Thus, this PR now consists of the following changes: (1) The RNN model is made modularizable and loadable from a file, with the existing model being loaded if none is otherwise provided. (2) Warnings are fixed, headers made more modular, the broken !SMOOTH_BANDS implementation is removed, and other such minor cleanups. (3) Maximum attenuation is made parameterizable. I retain that this feature itself is of arguable utility, but it more importantly brings with it a general configuration API that should scale to future changes. |
Note that in my case where the noise is very low (just cleaning up a voice over recording in a not so soundproof environment), I had to add a path without RRNoise and automate the mix because RRNoise was far too eager on my "fois" which means "times" in French. Roughly one over two sounded "...ois". It seems that the "fffff" sound is close enough to RRNoise's representation of noise that it gets removed. While doing that, I discovered that the phase behaviour of RRNoise is funky, so it cannot be mixed 50% wet. Using a maximum attenuation parameter that can be changed between every processes would enable having an automatable "dry/wet" control (with suitable smoothing of the control to avoid zipping of course, but that responsibility would rely on the wrapper DAW plugin like https://github.com/lucianodato/speech-denoiser/ that I use). |
OK, I reviewed the entire stack and after some rebasing (reordering+fusing some patches), I merged all but three changes. The only remaining issue is related to the max attenuation. I'm OK with the general idea, but have two issues with the implementation:
So the three patches I didn't merge are
|
So, should this be reworked and rebased to only include the remaining fixes or be closed? |
Pursuant to our email exchange a week ago, I've sieved out the important parts for merge here:
The NN model is modular, and a different model can either be selected from a built-in set or loaded from an external file.
Many new models are included, trained in a variety of situations.
Maximum attenuation is parameterizable. How useful this feature is is questionable :)
Sample rate is parameterizable. This only affects the mapping of FFT bins to bands, so it's quite general, though it does not affect the input rate, so samples are still expected 480 at a time.
A bug in the
!SMOOTH_BANDS
mode is fixedNot included are my broken "fix" to interpolation, the rename, or my various changes to the auto* scripts.