-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EngineEffectsDelay: effects chain delay handling #4810
EngineEffectsDelay: effects chain delay handling #4810
Conversation
This commit solves the delay issue for the effects chain. The solution is useful for the dry/wet or dry+wet mode. If the effects chain has a specific delay based on the effects processing latency, the dry signal is delayed to ensure dry and wet signals overlap. To delay the dry signal, the sum of all chain effects group delay is calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this new feature.
@davidchocholaty I'm sorry for taking so long to review this. If you are not otherwise busy in the meantime, it would be great if you could add some unit tests and microbenchmarks to this PR while you are waiting for my review. |
Okay, never mind. Yeah, I started to work on the unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I review any more code. I'd appreciate if you could incorporate the existing comments first. The code would already be a lot simpler if you can assume that pIn != pOutput
and that m_delayBufferSize
is known at compile-time.
I agree with that. The same problem is pointed out by @daschuer for locking and I agree that the buffer resizing shouldn't be used. However, the interesting point of this problem is to set the optimum delay buffer value. On the theoretical side, the number of delay samples can't be predicted for the effects chain, but it doesn't make much sense to work with huge delay values. It isn't possible to work with the Pitch shift effect latency value, because the effect latency is pretty huge for now. Here I think that it can reasonable to open a discussion about what is the delay value for the effects chain which is acceptable.
I would like to just point out the reason, why this condition is used. The process() method call by EngineEffectChain uses the call, that the input buffer is the same as the output buffer. The reason is, that the input buffer with a dry signal should be delayed. To avoid using a temporary buffer, the input buffer can be used for the method processing output. It doesn't make sense to call the copy method if the buffers (input and output) are the same. |
what about this snippet from constexpr double kdMaxDelayPot = 500;
const int kiMaxDelay = static_cast<int>((kdMaxDelayPot + 8) / 1000 *
mixxx::audio::SampleRate::kValueMax * mixxx::kEngineChannelCount);
Though we can still discuss that. We might want to be more conservative about the effects delay since that delay is actually audible and must be as short as possible. |
I posted some thoughts here: #4810 (comment) Essentially its a tradeoff we must benchmark. Copying to a temporary buffer is extra work, but it enables an optimization opportunity. Copying the extra buffer is also really quick on modern CPUs (0,5-9 microseconds or something (see comments in |
Also, can you share some of reasons again for not reusing or changing |
Of course. For example for following reasons:
|
Oh, I'm sorry. Thank you for this information. So, IMO there are two possibilities to from what class will
I agree, that the benchmark will reveal the correct variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you investigated using mixxx::ReadAheadSampleBuffer
for this purpose instead? I think it would reduce much of the code complexity.
In fact not. I'm sorry for the misunderstanding, but for what part of code do you mean to use |
No, I was suggesting to use it as the |
I really like that idea with a ring buffer. It looks like it could be a nice solution for the current workflow with a delay buffer. |
The commit changes the type of a value which represents propagated delay. This commit changes the code to use frames for propagating delay instead of samples. The reason, why propagating frames is better than samples is that it doesn't depend on the number of channels. However, the inner EngineEffectsDelay structure uses samples for calculations still, so the number of samples is recalculated by propagated number of frames and a current number of channels. To clean up the workflow, the input buffer isn't used as the output buffer too and the temporary buffer replaces it. The size of the delay buffer is known in the compile time.
Just for information, the current new version doesn't contain the use of std::span and ring buffer for now due to it depends on more sophisticated code changes and they don't fit into this commit topic. |
I agree that the switch to |
On second thought. You can just use a |
I agree. It's ok. I will implement the ring buffer and I have to finish tests and benchmark implementation. |
This commit adds an assertion for negative values of the number of delay frames set by EngineEffectsDelay::setDelayFrames. The handling uses VERIFY_OR_DEBUG_ASSERT for checking the number of delay samples. This value is used for inner calculations and should not be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the code is already easier to understand. If you abstract out the index-bookkeeping by use of a FIFO queue, it will be even simpler. I tried to not comment on this that will be made obsolete by the FIFO queue.
const int kiMaxDelay = static_cast<int>(0.508 * | ||
mixxx::audio::SampleRate::kValueMax * mixxx::kEngineChannelCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const int kiMaxDelay = static_cast<int>(0.508 * | |
mixxx::audio::SampleRate::kValueMax * mixxx::kEngineChannelCount); | |
constexpr int kiMaxDelay = static_cast<int>(0.508 * | |
mixxx::audio::SampleRate::kValueMax * mixxx::kEngineChannelCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have also no definition for a i prefix. 0.508 is a magic number can we name it? What is the unit of kMaxDelay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refine this anyways. Also rounding up to the next power of 2 helps the optimizer because the expensive modulo operation can be transformed into a binary AND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have also no definition for a i prefix. 0.508 is a magic number can we name it? What is the unit of kMaxDelay?
I agree that is a magic number. The value was used from the EngineDelay using maximum delay pot. The value can be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refine this anyways. Also rounding up to the next power of 2 helps the optimizer because the expensive modulo operation can be transformed into a binary AND.
I agree. That's a good point, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value can be changed.
Yes we should probably change it, which is why I didn't complain about it being a magic constant yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just for summarizing an easy calculation. Following the current calculation, with magic constant 0.508 from enginedelay.cpp
and using mixxx::audio::SampleRate::kValueMax
(which is 192 000) and mixxx::kEngineChannelCount
(which is stereo), the following power of two is 2^18 (262 144) or next 2^19 (524 288). I am thinking about whether it makes sense to think about a greater buffer because the following power of two (2^20) is over 1 million.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhmmm. The fact that we need to account for the (unlikely) worstcase (192Khz samplingrate) makes this tradeoff somewhat difficult to evaluate. I wish we could allocate the buffer based on the actual buffer size currently being used instead of the worst case.
What do we think is an acceptable maximum delay factor? 2x the current buffer size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that it is a little bit tricky, that so big buffer size is possible, but not so common. For the current pitch shift effect version, the delay is about 4 buffer sizes, or more, which is unacceptable. For now, I don't have a specific amount of maximum delay. I think, that about 2 buffer sizes are a good tradeoff. Anyway, I will test the different delay settings, what is maximum delay by ear and feeling (with only using dry signal and cross-fading). I will inform you about the results.
|
||
/// This method is used for obtaining the delay of the output buffer | ||
/// compared to the input buffer based on the internal effect processing. | ||
/// The method returns the number of frames by which the input buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The method returns the number of frames by which the input buffer | |
/// The method returns the number of frames by which the dry signal |
/// needs to be delayed so that buffers at the input and output | ||
/// of the effect overlap. The return value represents the current effect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// needs to be delayed so that buffers at the input and output | |
/// of the effect overlap. The return value represents the current effect | |
/// needs to be delayed so that buffers for the dry and wet signal (output | |
/// of the effect) overlap. The return value represents the current effect |
const int kiMaxDelay = static_cast<int>(0.508 * | ||
mixxx::audio::SampleRate::kValueMax * mixxx::kEngineChannelCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have also no definition for a i prefix. 0.508 is a magic number can we name it? What is the unit of kMaxDelay?
// and copy it to the destination buffer. | ||
pOutput[i] = m_pDelayBuffer[delaySourcePos]; | ||
delaySourcePos = (delaySourcePos + 1) % kiMaxDelay; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you feel lucky you may fiddle around to either use factorized functions or make this loop vetcorized.
This can be checked like described here:
Line 14 in 0ad01b6
// LOOP VECTORIZED below marks the loops that are processed with the 128 bit SSE |
This requires that you chop the loop in chunks to get around the % kiMaxDelay on every sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much vectorizing potential there actually is. I think the n % kiMaxDelay
always breaks the autovectorizer because you can't use vectorizing instructions when n
is at the wrap-around boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires that you chop the loop in chunks to get around the % kiMaxDelay on every sample.
I missed that in my last note in this thread (which is why that note essentially repeated what you just said). The problem I see is that we simply can't do that. At least the current interface and implementation allows the delay to be any frame frame number. I don't think we can make the interface more granular so we'd have to find a workaround in the implementation.
Thank you for your review @daschuer. I still respectfully recommend to ignore your review comments for now because they are all obsolete once we switch the implementation to a FIFO queue. |
I am afraid a FIFO will have issue with the ramping. When the delay changes. |
Here we use mixxx/src/soundio/sounddevicenetwork.cpp Line 147 in d5cf277
It is based on the PortAudio Ringbuffer and uses memcopy internally which is vectorized! |
I have compared with main and there we have ~195 ms offset. My Audio buffer is 85,3 ms. |
I hear also a crackling noise when turning the knob with this PR. I have the impression that it is more than in master. Is there soothing wrong with a ramping when the delay changes? |
I have crosschecked it an I can confirm that the crackling is not from the delay compensation. |
Since the span stuff is unrelated to the delay stuff we may consider to spit this PR, but not necessarily. |
If @davidchocholaty really has nothing else to do, he's free to do so (see it as an exercise in using git), but I don't really see the value in splitting this into a separate prerequisite PR (compared to the effort that takes to do so). |
Right, the Pitch shift delay latency isn't completely resolved. The problem is that RubberBand doesn't produce results for every input buffer, so the final delay has to be calculated more precisely between effect |
Sure, if rubberband delay handling requires more special care, we can do that in a separate PR. |
Thank you very much for pointing out the problem and double check. In any case, it is a bug it would be good to solve. IMO the problem is for the RubberBand result because for crossing the zero pitch or such a huge pitch change, the RubberBand doesn't have to produce sample values for the whole output buffer or for some of the output buffers at all. So, these buffers of uncalculated samples will be filled with zeros. Due to the first produced result of RubberBand, after zero samples aren't crossfaded, the crackling occurs. So, I agree that the problem isn't in delay handling, but IMO in the RubberBand's results-producing handling. |
I'm open-minded to splitting the |
Well, the only change required IMO is that it shouldn't be |
This commit adds constexpr keyword to the both implemented function, concretely to mixxx::spanutil::castToSizeType and mixxx::spanutil::spanFromPtrLen.
Unfortunately there is also crackling in other places on the rotary knob.
That works for me.
OK, after this is done and Ubuntu is release we may merge. Alternative we add this: |
Okay, it looks like a more complicated problem with a deeper investigation needed. IMO the best approach would be to solve the Pitch shift effect latency in separated PR first, and then fix this problem.
Sounds good to me, thank you. |
Lets wait for the ubuntu release. As previously discovered, using aliases lead to a rabbithole of problems. |
This commit fixes the const correctness of the span method.
This commit removes the Pitch shift effect delay reporting based it needed a deeper investigation and will be implemented in the separated PR (branch).
Based on our previous conversations I would like to sum up the status of this PR. I would like to keep, that this PR will just contain the delay handling for the effect chain independently to the Pitch shift effect, for which will be delay handling implemented in separated PR (now in progress, the PR will be published asap). Then the last mentioned issue will be fixed in separated PR after solving the Pitch shift effect latency handling. IMO the mentioned issues for this PR were fixed. I will re-request the reviews. If will be everything okay, IMO the PR is ready for re-running the failing commits after removing the ubuntu-20 CI. Do you agree? |
The plan looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This commit adds checking if the data type of size is the unsigned data type.
@daschuer your review is still red. any final comments or can we merge? |
We need to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Perfect, thank you for merging. |
Solves the delay issue for the effects chain. The solution
is useful for the dry/wet or dry+wet mode. If the effects chain
has a specific delay based on the effects processing latency,
the dry signal is delayed to ensure dry and wet signals overlap.
To delay the dry signal, the sum of all chain effects group delay
is calculated. In the current version, the solution is especially used
for the Pitch shift effect. The known issue is, that the Pitch shift
effect doesn't provide the right amount of latency and the delay
is pretty huge. This problem has to be solved for the Pitch shift
effect implementation first. The effect group delay propagation
was implemented through the use of the EffectProcessorImpl,
EffectProcessor and EngineEffect. The special class wasn't created
for the purpose to preserve the hierarchy of effects implementation
as much as possible.