Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for crash when using the JACK backend and quickly reconfiguring #529

Merged
merged 1 commit into from
Aug 29, 2020
Merged

Fix for crash when using the JACK backend and quickly reconfiguring #529

merged 1 commit into from
Aug 29, 2020

Conversation

hselasky
Copy link
Contributor

Found some bugs here and there and managed to improve overall jitter with my audio card.

@corrados
Copy link
Contributor

Thanks for your code. I'll put some comments to your new code.

src/client.h Outdated Show resolved Hide resolved
src/client.cpp Outdated Show resolved Hide resolved
src/client.cpp Outdated
return;
}

// In common audio subsystems, there may be multiple audio sizes in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting concept. Is this Linux-specific or does this apply to all supported OSs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies to all OS'es where Jamulus is not the only client of the sound card subsystem.

Usually sound mixers have a pre-defined and fixed interval for the buffering.

Also further, USB hardware only supports small buffers of size of 2ms 3ms 4ms and so on. When Jamulus specifies 64 samples of buffer, the USB hardware cannot technically program that! So you end up with audio bursts, that in turn affect the timing of the UDP packets, and I found this to be the root cause of all jitter problems during my tests with FreeBSD. Too bad Windows doesn't have a nanosecond clock. I'm not fully sure how this patch will work there. Maybe someone can help testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it when I am back at home in a week. If Windows does not support it, we should move your code in the Jack Sound.cpp file for now. Actually with my audio devices I tested, I did not have any higher jitter from the audio driver. What audio hardware are you using? And how did you measure the jitter?

Copy link
Contributor Author

@hselasky hselasky Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a X32-RACK audio device configured with 8-channel audio, where I use 2 for Jamulus. I measured the jitter by using wireshark and looking at the time between data packets going on the wire. After this patch I saw a major improvement.

I'm disabling this for Win32, because I think the timer there is not good enough (milliseconds only).

src/main.cpp Outdated Show resolved Hide resolved
src/util.h Outdated Show resolved Hide resolved
@hselasky
Copy link
Contributor Author

Thanks for your code. I'll put some comments to your new code.

Thank you!

@hselasky
Copy link
Contributor Author

I added in one more patch. I hope it is not too much :-)

@WolfganP
Copy link

Switching from int to float wouldn't add much processing time? (usually int arithmetic is way faster than float)

@corrados
Copy link
Contributor

I added in one more patch. I hope it is not too much :-)

Well, actually it is. I would prefer smaller pull requests.

@corrados
Copy link
Contributor

Switch all Jamulus audio sample processing to use floats instead of...

Can you please give some more background information why you did this?

@hselasky
Copy link
Contributor Author

Yes, the opus codec supports float, and I thought it would be cheap to add support for more than 16-bit sound. Many of the professional audio devices you recommend for use with Jamulus do 24-bit audio. There is a distinct difference between 16-bit audio and 24-bit audio, and this may help if the audio level is low, that you get a clear signal, and simply don't loose bits.

On the server side, double is used for mixing today. This is overkill! Using float will give a slight performance gain.
On the client side, using float instead of int16_t will not have any significant performance impact. We are already multiplying with double's, so I think we are good here too.

@hselasky
Copy link
Contributor Author

Switching from int to float wouldn't add much processing time? (usually int arithmetic is way faster than float)

The server is using "double" today. And using "float" is expected to reduce the processing time a bit.

@hselasky
Copy link
Contributor Author

@corrados : If some of the commits are good to go, and I can put them in a separate pull request, or maybe you can just push them separately and I'll rebase this pull request.

@corrados
Copy link
Contributor

corrados commented Aug 21, 2020

Let's do things step-by-step and not all together.

1) Obvious little fix like in socket the public slot -> ok, should be merged -> done

  1. Crash which you fixed by Mutex. The Mutex fixes the crash but not the root of the problem. I would like to see where the root of the problem is and fix that. Since the way the code is right now should not crash.

  2. Delaying audio packets to minimize jitter: Interesting concept. You code is right now in client.cpp and applies to all OSs. If you have an audio interface which is rocket solid regarding the timing, you may introduce up to a block size of additional delay by your code. So it may make things worse. But if you have an audio driver which gives us bursts, I agree that your code lowers the jitter which makes smalle jitter buffers possible at the server. But this is a special case. All my audio cards I have used with Jamulus do not have this bursted behaviour. I would not apply this algorithm per default for everyone.

  3. Converting to float: This is a big change in the core of the Jamulus signal processing. I see your point with the 24 bits but I don't think it gives you any noticeable improvement in the "jam session" use case. 24 bits may make sense in a studio environment. Have you done speed evaluations regarding OPUS working on float instead of short? Is it equally fast? I also have concerns about special functionality like the clipping indicator (which, I guess, will not work correctly with your new code) and, e.g., the fader group function which works on relative levels and I think you will need double precision to avoid error propagation. BTW, changing the current signal processing code to work on float instead of double in the server is a trivial and small change.
    Bottom line: I am not sure if Jamulus really needs this change.

@hselasky
Copy link
Contributor Author

Will you merge #1 ?

@hselasky
Copy link
Contributor Author

Regarding #3, what kind audio cards are these? What interface do they use? What OS?

@hselasky
Copy link
Contributor Author

Regarding #4 I don't see any noticable increase in CPU usage, its like 10->11 % CPU for both server and client on my machine.

@corrados
Copy link
Contributor

Will you merge #1 ?

yes

@corrados
Copy link
Contributor

Regarding #3, what kind audio cards are these? What interface do they use? What OS?

I have tried with Soundblaster Audigy on Lubuntu, Behringer UCA 202 on Windows and Mac, Lexicon Omega on Windows and Mac.

@hselasky
Copy link
Contributor Author

Regarding #3, what kind audio cards are these? What interface do they use? What OS?

I have tried with Soundblaster Audigy on Lubuntu, Behringer UCA 202 on Windows and Mac, Lexicon Omega on Windows and Mac.

Did you enable the jitter measuring debug code for these setups in the audio callback ?

BTW: I changed my patch a bit, to attack the UDP transmission instead of the AudioProcess routine. Because encoding/decoding takes a variable amount of time, so to get that out of the equation, I've moved the delay around a bit.

@hselasky
Copy link
Contributor Author

@corrados : Something else before I forget it:
celt/entdec.c: _this->nbits_total=EC_CODE_BITS+1
celt/entenc.c: _this->nbits_total=EC_CODE_BITS+1;

The first bit in every OPUS frame is unused! This may be possible to use for something like a toggle .....

@corrados
Copy link
Contributor

Will you merge #1 ?

yes

done

@hselasky
Copy link
Contributor Author

Will you merge #1 ?

yes

done

Thank you - just rebased my patches.

@corrados
Copy link
Contributor

corrados commented Aug 21, 2020

Did you enable the jitter measuring debug code for these setups in the audio callback ?

I just did this for the Soundblaster Audigy card under Linux:
grafik
The spikes are caused by other processes interfering with Jamulus (causing x-runs). As you can see, the regular jitter is very small so there is no need for traffic shaping for that audio card.

Can you post such a plot for your audio hardware? I am curious how it looks in your case where you get bursts of audio blocks.

@corrados
Copy link
Contributor

Regarding #4 I don't see any noticable increase in CPU usage, its like 10->11 % CPU for both server and client on my machine.

I think we should not test it on a normal desktop CPU but on low end hardware like the Raspberry Pi Zero, see: #483 (comment)

@hselasky
Copy link
Contributor Author

image
image
image

@corrados : Can you test a USB based audio device?

@corrados
Copy link
Contributor

corrados commented Aug 21, 2020

Thanks for the plots. That is very interesting. I'll evaluate the jitter of my USB-devices as soon as I am back at home. I'm now very curious how it will look like...

Can you please add which operating system you have used for these plots?

@hselasky
Copy link
Contributor Author

Can you please add which operating system you have used for these plots?

I'm using FreeBSD w/ XHCI USB controller.

@hselasky
Copy link
Contributor Author

@corrados

I think we should not test it on a normal desktop CPU but on low end hardware like the Raspberry Pi Zero, see: #483

I tested on a RPI3 running FreeBSD:

Jamulus running on RPI3:
w/o patches: 10.11% CPU usage approx
w/ float patches: 8.96% CPU usage approx

However a new problem arised:

ntpdate 0.freebsd.pool.ntp.org
22 Aug 11:12:14 ntpdate[4271]: adjust time server 192.36.143.130 offset -0.021860 sec

The RPI3 produces an effective sample rate of 48060 Hz, when there is no packet loss, compared to my other test computer, meaning the jitter buffer on the client "goes nuts" after a while :-(

I think we need some tuning parameter here, or calibration, for this to work better!

@hselasky
Copy link
Contributor Author

@corrados : I have an idea how we can solve all of this jitter issues once and for all by analyzing the timestamps on all packets received on the client :-) Let me work on it a bit.

The main problem I see is that we send 3 packets every 4 milliseconds, for the smallest buffer size or 3 packets every 8 milliseconds. There is a simple statistical remainder trick that can be used here, but we need a bigger (prime) number than 3 I think!

The goal is not depend on a high-resolution timer, but that a millisecond timer would suffice, like is available in Windows.

Meanwhile I would like you to consider my floating point patch. I can put it first in the commit list to make merging easier!

@corrados
Copy link
Contributor

corrados commented Aug 23, 2020

Meanwhile I would like you to consider my floating point patch. I can put it first in the commit list to make merging easier!

Yes, I need a pull request that only contains these changes. Then I can Start reviewing your changes.

@hselasky
Copy link
Contributor Author

@corrados : Once #535 is resolved, we can resume this one!

@corrados
Copy link
Contributor

This is what I get on my Lexicon Omega under Windows:
grafik
So it gives me a jitter of about 1 ms.

@hselasky
Copy link
Contributor Author

Something like that is expected. It likely means the USB stack in Windows is using a 1ms audio buffer, which is a requirement by the XHCI USB host controller.

@hselasky
Copy link
Contributor Author

@corrados : I have the plans ready for a cool jitter correction algorithm. All I need is one bit per frame, and I see the first bit of every OPUS frame appears unused, so I'm asking for permission to use that bit for something!

@corrados
Copy link
Contributor

Can you please give some more info for what reason you need the bit?

@hselasky
Copy link
Contributor Author

@corrados : I want to implement a "noise-based" (not sure what you would call it) sequence number for every OPUS frame. That means there is for example a 13-bit frame number incrementing one by one, then via a formula only one of the 13-bits is or'ed into every OPUS frame.

The reason we need a frame number is because it is very difficult to estimate the packet loss accurately when there is not timing information provided by every UDP packet, from the sender side.

Also 48kHz is not 48kHz. Over time the clocks will drift (sender and receiver side), but if we had a sequence number, this clock drift could easily be eliminated by libsamplerate on the client side!

Without a sequence number it is impossible to know how many packets were dropped on the wire.

--HPS

@corrados
Copy link
Contributor

corrados commented Aug 24, 2020

This goes in a wrong direction. The idea of Jamulus is to keep it simple. About 10 years back I also had done tests with a resampling but that did not give any improvement. The idea is: if a packet is lost, it is lost. No need to track the packet number. You can read more about this in http://llcon.sourceforge.net/PerformingBandRehearsalsontheInternetWithJamulus.pdf

If you want to include a resampler, you have a problem that we are working on blocks of data. If you have a clock difference, you would have to resample and get, e.g., 129 samples out of the resampler instead of 128. But you have to transmit 128. So what to do with the additional sample? You would have to introduce a new buffer. So you will have additional latency. The only meaningful clock correction would be hardware based (maybe GPS-based). But this is out of scope for the Jamulus software.

it is very difficult to estimate the packet loss

For what reason do you need the packet loss for improving the sound card jitter?

@hselasky
Copy link
Contributor Author

I understand that Jamulus doesn't care about:

a) reordering of packets
b) lost packets

Using the OPUS codec to conceal jitter might be more time-clever than using libsamplerate.
OK, agreed, let's forget about libsamplerate.

For what reason do you need the packet loss for improving the sound card jitter?

When you know the exact clock rate differences, you can then subtract or add that to the sound card jitter, to reduce the size of the jitter buffer needed.

Right now in the "perfect" link scenario all clock differences are accounted as jitter. It is like Jamulus needs packet loss to keep the jitter buffer down!

You asked about testing a Raspberry PI, and what I found is that the system clock is not that accurate.

The RPI for example generates 751 64-sample OPUS packets per second, but the client only consumes 750 per second.
So every second with "perfect" transmission the jitter buffer grows by 1 packet . This also adds up for the latency - right?

Even for 1 second of continous data, we could reduce the latency by 1.33 ms, if we knew some sequence numbers here and there, which is possible by filling the first bit of every OPUS frame with a predictable "noise" pattern.

I think it is simply wrong to assume that "the internet connection" will always have packet drops.

@corrados
Copy link
Contributor

When you know the exact clock rate differences, you can then subtract or add that to the sound card jitter, to reduce the size of the jitter buffer needed.

Right, you could do that. But then you have to drop a packet from time to time to correct for the different clock rates. The jitter buffer just does exactly this anyway.

So every second with "perfect" transmission the jitter buffer grows by 1 packet . This also adds up for the latency - right?

This depends on the selected jitter buffer size. If the jitter buffer size is set so that it corrects 1 packet per second, then you have exactly the same behavior.

I think it is simply wrong to assume that "the internet connection" will always have packet drops.

Sure, you cannot say that the network always drops packet (do I do that in my paper? ;-) ). It depends on what type of network you have. There are more reliable connections and less reliable connections. E.g. over WLAN you will get packet loss in most cases.

Usually Jamulus users will have network issues by other devices overloading the network with updates, video streams, etc. In that case you will not be able to estimate the clock offset reliably. Just for the special case for a near perfect network condition you will have a chance to estimate it and drop some audio blocks at the right time to compensate for the clock offsets. But again, this is what the jitter buffer already does.

@corrados
Copy link
Contributor

Here is the jitter I get if I use the Lexicon Omega under MacOS with 128 samples block size:
grafik
And this is with 64 samples block size:
grafik
As you can see, I have very low jitter.
When I do my Jamulus jam sessions, I usually do it with 64 samples under MacOS. That gives me a very good performance.

@hselasky
Copy link
Contributor Author

I see. Let me do a proof of concept on an own branch, and see if it makes any difference, hopefully w/o breaking too much of the established concepts.

@hselasky
Copy link
Contributor Author

over WLAN you will get packet loss in most cases.

Yeah. Been there done that. Does not work, even with expensive ones like from Ubiquity.

@hselasky
Copy link
Contributor Author

I reduced this pull request to only contain two small fixes. Hopefully increasing the chance of being merged.

@corrados
Copy link
Contributor

corrados commented Aug 26, 2020

Could you please split the pull request further? That makes the integration and testing easier. One pull request for each "topic" like e.g. "replacing the presice timer for ping" would be great. Thanks.

@hselasky
Copy link
Contributor Author

@corrados : Yes.

Need to ensure that JACK callbacks see the running flag as false after
being stopped. Use a QMutex for this.

Signed-off-by: Hans Petter Selasky <hps@selasky.org>
@hselasky hselasky changed the title Various fixes Fix for crash when using the JACK backend and quickly reconfiguring Aug 26, 2020
@corrados corrados merged commit 05783a6 into jamulussoftware:master Aug 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants