Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing Rubber Band padding, preventing it from eating the initial transient #11120

Merged
merged 13 commits into from
Dec 21, 2022

Conversation

robbert-vdh
Copy link
Contributor

This fixes #11093. See #11093 (comment) for more information on what's going on here. In short, both the docs and the faq mention that you should add the requested amount of padding (and then drop that padding again) after initializing or resetting the time stretching engine. If you don't do this, then any initial transient will be replaced by a short fade in. With the r3/finer algorithm this is 2048 samples, which becomes very audible. This PR adds the required padding routine, fixing the problem See the linked issue for example videos of how this behaved before and after the change. I also replaced the manual memory management with STL data structures while I was at it and fixed some memory safety issues by explicitly deleting the copy and move constructors/assignment operators. Previously invoking these by accident would result in double frees and use after frees.

These functions still use raw pointers so there's a bunch of stuff
that's difficult to catch that can still go wrong, but this at least
removes the need for having to manually deallocate these structures. And
this class didn't adhere to the rule of five, so it was very easy to
misuse it and cause double frees and use after frees.
This needs to be done or else Rubber Band will eat up the initial
transient by introducing a short fade in/2048 sample low-pass.
Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the changes seem very sensible to me so far.

Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I'm not comfortable with merging until someone can test whether the R2 stretcher still sounds good.

@robbert-vdh
Copy link
Contributor Author

Just noticed I got the #ifdef in d8c8ca9 wrong. It shouldn't have been an #ifdef, RUBBERBANDV3 is always defined but it evaluates to true when using librubberband 3.x (diff). I blame programming at night.

@Swiftb0y
Copy link
Member

Swiftb0y commented Dec 8, 2022

true, my fault for suggesting it wrong.

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care. I think this needs to be tweaked a bit. See my comments. My findings are from the V2stetcher.

@@ -19,7 +17,7 @@ namespace {
// TODO (XXX): this should be removed. It is only needed to work around
// a Rubberband 1.3 bug.
// This is the default increment from RubberBand 1.8.1.
size_t kRubberBandBlockSize = 256;
constexpr size_t kRubberBandBlockSize = 256;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to update the comment?
Our oldest supported version is 1.8.2 from Ubuntu Focal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, completely remove the block size limit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I have double checked, the bug has been fixed in Rubberband 1.7
breakfastquay/rubberband@c26dc1d

By the way, Rubberband will never request more than getPreferredStartPad() * 2
https://github.com/breakfastquay/rubberband/blob/de56cd114a678003dfef17abbd74ebd9203964eb/src/finer/R3Stretcher.cpp#L565
Maybe we can use this info to improve our buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted most of adb034b in 9b8e82a. It Works On My System ™️ with librubberband 3.1.2.

#else
// This _should_ be equal to the latency in older Rubber Band versions:
// https://github.com/breakfastquay/rubberband/blob/c5f99d5ff2cba2f4f1def6c38c7843bbb9ac7a78/main/main.cpp#L652
size_t remaining_padding = m_pRubberBand->getLatency();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input and output padding is different by the playback rate.

The interesting commit is this:
breakfastquay/rubberband@72654b0

So we need to do here something similar:
int toPad = int(round(toDrop * frequencyshift));

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah good catch. Looking at the changes to the actual time stretchers it seems like that's how you get the new getStartPad() behavior in terms of getLatency(). For the R3 time stretcher the getStartDelay() value is different from the old getLatency() value, so I'll emulate that and then push the changes.

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or well maybe the value was incorrect in that screenshot? Looking at https://github.com/breakfastquay/rubberband/blob/v1.8.2/src/StretcherImpl.cpp#L826-L831 getLatency() should indeed be identical to getStartDelay() (like the documentation suggests). Never mind, now I'm confusing myself. For the R2 algorithm they're the same, with the R3 algorithm they're different but I'm not sure if that was intentional. And that would only affect beta versions of librubberband anyways so it's not a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#else
size_t padding_to_drop = m_pRubberBand->getLatency();
#endif
while (padding_to_drop > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call while m_pRubberBand->available() before this loop.
In this regards, I am not sure if we can pull out all padding immediately, or if it needs to be stay in until we have provided the real samples of the same size.

At least in the rubberband main, this is done not immediately after filling in the padding.

I am afraid dropping at this point is not correct, because we likely need to keep the rubber band buffers filled with at least the startDelay() to perform windowing around the current sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.

The buffer is initial m_sWindowSize * 2;
4096 @ 48 kHz

The default start pads is m_aWindowSize / 2;
1024 @ 48 kHz

https://github.com/breakfastquay/rubberband/blob/52fc576500931afb5a7374a241c5f386b14a5e29/src/faster/R2Stretcher.cpp#L131

All this makes kind of sense. To retrieve 1 sample, we need half of a window samples before the sample and half a window silence after it.

In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?

I think the initial getSamplesRequired() return m_aWindowSize.
It would be interesting how many samples are available after passing the requested amount.
If it is 0 or m_aWindowSize / 2?
In the later case it is clear why a padding of m_aWindowSize / 2 is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.

The padding is half of the window size, so that should definitely fit in Rubber Band's buffers. And thanks to the m_pRubberBand->setTimeRatio(2.0) call the buffer should be able to fit at least two of those windows before reallocating.

In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.

That was part of my experiments in solving #11125, but it's a bit more difficult than it seems. Especially because the read ahead manager is shared by multiple things and reading data through it invokes a bunch of side effects. Older samples also may not be cached (I've noticed this causing issues when beat jumping somewhere for the first time that weren't present when not reading past samples). I have not yet had time to dig into this further, so if you have suggestions for how to compensate for the latency here without messing with the rest of the deck's state (so switching back and forth between linear interpolation and Rubber Band time stretching should be seamless and not shift any timings around) then that would be great.

The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?

Yes, like I mentioned before, they're using an STFT (Short-time Fourier Transform). It's an overlap-add process where you a window function to a chunk of the input data, take the discrete Fourier transform of that, process the frequency domain signal, take the inverse discrete Fourier transform of that, apply another window function, and then add that to the output buffer, partially overlapping the previous output. The reason why the required padding is half of the input buffer is because most window functions used for these things are raised cosine windows, like the Hann/Hanning function I linked below. Because you apply (= multiple the chunk of audio with) the window twice, once before the DFT, and once after the IDFT, you've effectively applied the square of the window function to the signal if you didn't do anything to the signal in the frequency domain. And if you try adding up squared Hann functions with a quarter period difference between them, you'll notice that after a period the resulting function forms a flat line. That explains why the required padding is half a window, and why you get that smooth fade in if you don't add the padding.

Some wikipedia inks if you're curious about how and why this all works:

https://en.wikipedia.org/wiki/Short-time_Fourier_transform
https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
https://en.wikipedia.org/wiki/Hann_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah I guess you are right, we shouldn't be reading and dropping samples immediately. We should be doing the same thing the rubberband cli is doing, and process the audio as normal but drop the first getStartDelay() samples of it. That also aligns with the STFT primer I posted above heh.

This of course didn't work correctly before. We should drop the first n
samples from the output, but we can't do that before processing more
input. This also fixes the uncompensated latency/clicks issue from mixxxdj#11125
in the process.
@robbert-vdh
Copy link
Contributor Author

@daschuer I fixed the way padding is dropped (wasn't thinking straight heh). Like I explained in the STFT primer in #11120 (comment), the scaled silence samples should be dropped from the output. You can't just drop them beforehand (I think I wrote this PR when it was already late at night heh).

The padding is now correctly dropped, fixing both the transient eating problem and also fixing #11125 (which was sort of caused by the previous incorrect padding dropping behavior). With these changes scratching while having keylock enabled is finally smooth. As is beat jumping to audio that is already cached. Please give this PR another try with these changes!

@daschuer
Copy link
Member

The issue with the "wrong" silence padding is kind of hidden, because of the fading of one buffer when seeking.
It is however notable with loops and a 440 kHz sine tone:
grafik

The keylock change is in the middle, perfectly in sync using Rubberband faster.

This is how it looks with soundtouch:

grafik

In case of Soundtouch we see the same issue, but we see also the keylock change and can measure a delay of ~0.018 s = 512 Frames.

@daschuer
Copy link
Member

I have investigated this a bit more and the issue is that we have no reset() call in case of looping.
Looping is performed inside the ReadAheadManager::getNextSamples() and fading is done with the input signal.
So there is no reset required.
I have no idea why the signal looks different though.

SINT frames_available = m_pRubberBand->available();
SINT frames_to_read = math_min(frames_available, frames);
const SINT frames_available = m_pRubberBand->available();
const SINT frames_to_read = math_min(frames_available, frames);
SINT received_frames = static_cast<SINT>(m_pRubberBand->retrieve(
m_bufferPtrs.data(), frames_to_read));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to read here frames_to_read + frame_offset;
Even though the buffer is more than big enough, it would be correct to check for its size.
We may consider to use a way smaller buffer than MAX_BUFFER_LEN b.t.w.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, did you miss to push a commit?

The function expects frames and we need to discard frame_offset from the retrieved buffer. So we need to retrieve frames + frame_offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 755a4a4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked code makes that the function returns less frames than it might could have read. So we should not clamp the value to frames , but to frames + frame_offset in line 132

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we should not. I can't check what m_pRubberBand->available() returns here, but logically it would return 1024 if it's doing overlap-add with four times overlap. At a 1.0 pitch ratio/when the time stretcher should not be making any adjustments, we need to throw away half a window's worth of samples, which is 2048 in this case. So in that situation the first two calls to this function must read and return 0 samples or else we've been reading the processed padding that we're trying to throw away here.

The linked code makes that the function returns less frames than it might could have read.

It can only read a small window of output at a time. Presumably this is 1/4th of the FFT window size. So 1024 samples. Which means that the first two calls to this function (each producing 1024 samples of garbage output) should be thrown away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit debug code to shows the issue after a seek:
qDebug() << "padding = " << m_remainingPaddingInOutput << "frames = " << frames << "available = " << frames_available;

debug [Engine] padding  =  1025 frames  =  256 available  =  0
debug [Engine] padding  =  1025 frames  =  256 available  =  512
debug [Engine] padding  =  769 frames  =  256 available  =  768                 
debug [Engine] padding  =  513 frames  =  256 available  =  1024
debug [Engine] padding  =  257 frames  =  256 available  =  1280
debug [Engine] padding  =  1 frames  =  256 available  =  1536
debug [Engine] padding  =  0 frames  =  1 available  =  1792
debug [Engine] padding  =  0 frames  =  256 available  =  1791
debug [Engine] padding  =  0 frames  =  256 available  =  1535
debug [Engine] padding  =  0 frames  =  256 available  =  1279

In the second call, we read only 256 frames from the available 512. The available frames are stacked up before they are reduced again. We need to read all available frames in that case. This needs to be fixed.

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I see what you mean now! Fixed in d3aa210.

Somewhat related, would there be any interest in exposing the Rubberband R3 + short window option? I added it for myself here: robbert-vdh@af570f4 Its performance is in line with R2 (and also no huge spikes when searching), and quality wise it's more or less a sidegrade to R2. It doesn't have R2's time domain transient preservation, but it also doesn't introduce spurious transients and it does still sound better than R2 without transient preservation. Hmm maybe not. It usually does better than R2, but it sometimes also transforms kickdrums into smeary messes.

// before using it. Otherwise it will eat add a short fade-in, destroying
// the initial transient.
size_t remaining_padding = getPreferredStartPad();
std::fill_n(m_buffers[0].span().begin(), kRubberBandBlockSize, 0.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this filling zeros is the correct solution here, because we always fade in from zero after seek to avoid click sounds. This means that the first half of the buffer needs to be zero anyway.

I have verified it with a recording in Audacity that this looks good.

My fist assumption to use here real samples is correct but they have zero gain = 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the idea of the zero padding is that if the STFT process is working correctly, then those samples will not have any impact on the rest of the output. See my comment above. I added a link to the comment in the code.

@@ -19,7 +17,7 @@ namespace {
// TODO (XXX): this should be removed. It is only needed to work around
// a Rubberband 1.3 bug.
// This is the default increment from RubberBand 1.8.1.
size_t kRubberBandBlockSize = 256;
constexpr size_t kRubberBandBlockSize = 256;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I have double checked, the bug has been fixed in Rubberband 1.7
breakfastquay/rubberband@c26dc1d

By the way, Rubberband will never request more than getPreferredStartPad() * 2
https://github.com/breakfastquay/rubberband/blob/de56cd114a678003dfef17abbd74ebd9203964eb/src/finer/R3Stretcher.cpp#L565
Maybe we can use this info to improve our buffers.

This mostly reverts adb034b as this bug
should in theory only affect librubberband 1.3 and the oldest supported
version is currently 1.8.2.
SINT frames_available = m_pRubberBand->available();
SINT frames_to_read = math_min(frames_available, frames);
const SINT frames_available = m_pRubberBand->available();
const SINT frames_to_read = math_min(frames_available, frames);
SINT received_frames = static_cast<SINT>(m_pRubberBand->retrieve(
m_bufferPtrs.data(), frames_to_read));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, did you miss to push a commit?

The function expects frames and we need to discard frame_offset from the retrieved buffer. So we need to retrieve frames + frame_offset.

This was comparing against the wrong thing in
755a4a4.
Instead of reading his in small blocks and then throwing away all output
a couple times until we threw away all of the padding.
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you.

@daschuer
Copy link
Member

We have one big pending issue, that is not touched by this PR:

In case of seeks we do the following in one callback:

  • Read one chunk at the old position
  • Process the padding
  • Read the new chunk

This means with the default 1024 Buffer size
We need to process:
1024 * 2 + 1025 Frames.

This is roughly three times the CPU.
In a 5 ms low latency setup, it is 6 x the CPU.

So it seems the cross fading during a seek of the stretched samples is probably a bad idea due to the high risk of buffer underruns effects all Tracks and the tempo.

It would be great if we could move the cross fading to the input side and keep rubber band settled. This will delay seeks by extra 1025 though ... mmmm.

Ideas?

@daschuer
Copy link
Member

#11142

@daschuer
Copy link
Member

:-/ Oh That's odd. I was wrong. The additional processing of the Padding does make this CPU load issue worse ...
What to do? Hide it behind a compile switch?

@robbert-vdh
Copy link
Contributor Author

This is roughly three times the CPU.

Not quite. Keep in mind that Rubber Band processes the input in windows of n samples at a time. With the R2 algorithm (and default window size settings) that's 2048 samples, and with the R3 algorithm that's 4096 samples. If the Rubberband process function needs to produce, say, 128 samples, then it always needs to run 2048/4096 samples through the algorithm before it starts producing output. So when you add the padding the next window after the first one will be processed a little sooner, but after a reset the time stretcher will still only process a single window's worth of samples.

return received_frames;
}

void EngineBufferScaleRubberBand::deinterleaveAndProcess(
const CSAMPLE* pBuffer, SINT frames, bool flush) {
DEBUG_ASSERT(frames <= static_cast<ssize_t>(m_buffers[0].size()));
Copy link
Member

@JoergAtGithub JoergAtGithub Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to compile this locally on Windows with VS2022, it fails with an error, that identifier ssize_t is undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize ssize_t isn't from the standard library, and it compiles fine with MSVC on the CI. Probably through some transitive dependency. I guess ptrdiff_t would be the portable alternative that comes closest to ssize_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik its part of the C standard library. I guess you could also #include <cstddef>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't appear to be. Libusb and the other libraries Mixxx uses that also use ssize_t seem to explicitly define it on Windows.

robbert-vdh added a commit to robbert-vdh/mixxx that referenced this pull request Dec 22, 2022
`ssize_t` is only officially defined on POSIX, and apparently this
results in compiler errors with Visual Studio 2022:

mixxxdj#11120 (comment)

`ptrdiff_t` is the closest type defined by the standard.
napaalm pushed a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
…and-padding

Add missing Rubber Band padding, preventing it from eating the initial transient
napaalm pushed a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
…and-padding

Add missing Rubber Band padding, preventing it from eating the initial transient
napaalm added a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
napaalm pushed a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
…and-padding

Add missing Rubber Band padding, preventing it from eating the initial transient
napaalm added a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
napaalm pushed a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
…and-padding

Add missing Rubber Band padding, preventing it from eating the initial transient
napaalm added a commit to napaalm/mixxx that referenced this pull request Jul 10, 2023
napaalm pushed a commit to napaalm/mixxx that referenced this pull request Jul 11, 2023
…and-padding

Add missing Rubber Band padding, preventing it from eating the initial transient
napaalm added a commit to napaalm/mixxx that referenced this pull request Jul 11, 2023
napaalm added a commit to napaalm/mixxx that referenced this pull request Dec 30, 2023
…-rubberband-padding"

This reverts commit 0a15ae8, reversing
changes made to f0d9272.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rubberband R3 time stretching algorithm adds a fade in/eats the transient on playback start
4 participants