-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch all Jamulus audio sample processing to use floats instead of a mix of double and int16_t #535
Switch all Jamulus audio sample processing to use floats instead of a mix of double and int16_t #535
Conversation
const int32_t iCurSam = static_cast<int32_t> ( | ||
pSound->vecsTmpAudioSndCrdStereo [frmNum * oboeStream->getChannelCount() + channelNum] ); | ||
floatData[frmNum * oboeStream->getChannelCount() + channelNum] = (float) iCurSam/ _MAXSHORT; | ||
const float fCurSam = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you compiled and run the Android code to check that it works with your modifications? Or did you just change the code and have not compiled or tested it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't check Android. Need some help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I cannot compile Jamulus for Android. The Android development was done mainly by Simon. But as far as I know, he still had issues on the Android platform. So maybe we can live with the fact that the changes in the Android interface are very small. So I think it is ok to take your changes untested.
@@ -79,7 +79,7 @@ class FmtSubChunk | |||
static const uint32_t sampleRate = 48000; // because it's Jamulus | |||
const uint32_t byteRate; // sampleRate * numChannels * bitsPerSample/8 | |||
const uint16_t blockAlign; // numChannels * bitsPerSample/8 | |||
static const uint16_t bitsPerSample = 16; | |||
static const uint16_t bitsPerSample = 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pljones You should review these changes.
{ | ||
name = _name; | ||
|
||
for(int i = 0; i < numChannels * iServerFrameSizeSamples; i++) | ||
{ | ||
*out << pcm[i]; | ||
/* samples must be stored in little endian order */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something, @pljones should review.
windows/sound.cpp
Outdated
@@ -615,243 +776,107 @@ void CSound::bufferSwitch ( long index, ASIOBool ) | |||
GetSelCHAndAddCH ( pSound->vSelectedInputChannels[i], pSound->lNumInChan, | |||
iSelCH, iSelAddCH ); | |||
|
|||
#define CSOUND_GET_DATA(format,endian,factor) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually do not like these types of defines. The resulting code does not contain that much redundancy but the readability of the define as well as the debugging is much worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Either you repeat that piece of code a dozen times, or you use that macro. I see no simple way to resolve this? Do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not re-invent the wheel but use what is available and well tested. E.g.: https://app.assembla.com/spaces/portaudio/git/source/master/src/hostapi/asio/pa_asio.cpp#ln608
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see my changes are in-line with what PortAudio does. I believe my changes are more compact.
Should we add support for PortAudio as a backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with this pull request. There is an Issue for that already: #116
I believe my changes are more compact.
It's not about being more compact, it's more about having a code which is tested and known to work correctly for all bit depths. Verifying the functionality is the most critical part. The most Jamulus users use Windows. The sound interface is critical for stability and reliability of the Jamulus software. That's why I am concerned using almost untested code here.
We should take the portaudio code here and later on we might integrate the portaudio library as an additional option for the sound interface in Jamulus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look at it again tonight.
windows/sound.cpp
Outdated
{ | ||
// mix input channels case: | ||
int16_t* pASIOBufAdd = static_cast<int16_t*> ( pSound->bufferInfos[iSelAddCH].buffers[index] ); | ||
CSOUND_GET_DATA(sample16,Le,1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested all possible ASIO types that they work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't. I've only made sure that the code compiles.
windows/sound.cpp
Outdated
@@ -864,240 +889,91 @@ void CSound::bufferSwitch ( long index, ASIOBool ) | |||
{ | |||
const int iSelCH = pSound->lNumInChan + pSound->vSelectedOutputChannels[i]; | |||
|
|||
#define CSOUND_PUT_DATA(format,endian,factor) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment regarding the function defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Either you repeat that piece of code a dozen times, or you use that macro. I see no simple way to resolve this? Do you?
There is a branch https://github.com/corrados/jamulus/tree/multithreading_improvements which I am recently worked on. It contains changes in the server which, I assume, will introduce merge conflicts. This branch has higher priority than yours since we are in a process of improving the multi-CPU core performance of the Jamulus server. The branch is not ready but hopefully will be ready in some days. Just to inform you about it. |
BTW I forgot to mention: Thanks for creating a new pull request which only contains changes for the float type change :-). Please note that since there are so many changes, it will take some time to test/verify/integrate your code. |
@bflamig Have you found some time for your review yet? |
UPDATE: I had numerous errors in the following. I have corrected them. Guess I was too tired when I wrote the post. @corrados, @@hselasky**. Have been busy. Just started looking at this new code today. I have focused, for now, on just the 32 bit lsb ASIO type. Once that is understood properly, the others should go smoothly. Why focus on that type? Because the ASIO drivers of the most common (because they are inexpensive) interfaces, just as the Scarlett 2i2 and 4i4, or the Behringer UMC 202HD and UMC 404HD seem to be using 32 bit little-endian ASIO samples -- at least that's what I find when running the debugger on a Jamulus client with these devices. I suspect that a significant percentage of the Jamulus users are using these devices because they are the best bang for the buck -- good quality at an inexpensive price. Is that the case? Anybody know? Overall, the architecture for @hselasky's ASIO floating point stuff (that is, the templated buffer conversion architecture) seems like a reasonable one to me. However, after really looking more closely, I'm not sure about the conversion details. Many questions abound, that maybe you or someone who knows can answer: The ASIOInt32LSB type (I may have spelled it wrong): What exactly does that mean? We know there are no audio interfaces that can really have 32 bit audio because it's impossible to implement any A/D (or D/A) to 32 bit precision. Most of the common interfaces these days support "24 bit" audio, and even that is very difficult. But let's say for sake of discussion that they are using 24 bit precision. The question I have, and haven't found a satisfactory answer for, is which bits of a 32 bit sample are the 24 bits located in? Is the most significant 24 bits (with the remaining 8 bits being dither, or zero, or just plain garbage). Is the least significant 24 bits? I realized that I don't really know and the audio device interface documents don't say! So I'm looking for that answer. The latest sample32LSB class that @hselasky provides has some constant factors that ... well, I don't understand them. In earlier incarnations, I thought I understood what he was using, but know I don't. @hselasky perhaps you can explain what you are thinking here? Also, what scale should be used for the floating point representation? -1.0 to 1.0? That was the original concept I think that @hselasky was using, but I'm not sure his new constants are providing that. And what exactly does the floating point encoding of OPUS expect for the scale? Is it really -1.0 to +1.0? I tried to wade through the very dense OPUS documentation to find out -- and looked at the very dense OPUS source code, and I still have no idea. But let's supposed we want to convert all 32 bits of an integer into float and back, using -1.0f to +1.0f for the scale: I have discovered there is a way to convert full 32 bit integers (-2147483648 to +2147483647) into (-1.0f to +1.0f) but it can't easily be done using single precision floating point constants. I discovered that if the constants are in double precision, then it's possible, mostly. To wit, running an example under the Microsoft Visual Studio compiler (where little endian rules the day):
I tried various different test cases of float values between -1.0f to +1.0f and int values between -2147483648 to +2147483647, including the corner cases, and used "round tripping" to uncover any problems. For example: my32LSB buffer; buffer.put(1.0f); // results in buffer.data = 0x7fffffff; float f = buffer.get(); // results in 1.0f; buffer.put(-1.0f); // results in buffer.data = 0x80000000; f = buffer.get(); // results = -1.0f; buffer.put(0.5f); // results in buffer.data = 0x3fffffff; // ** I had this wrong before ** f = buffer.get(); // results in 0.5f; buffer.put(-0.5f); // results in buffer.data = 0xc0000001; // ** I had this wrong before ** f = buffer.get(); // results in = -0.5f; This is great and all, but then I tried the following: buffer.data = 0x12345678; f = buffer.get(); // results in 0.142222226f buffer.put(f); // results in 0x1234567f, NOT 0x12345678 // ** I had this wrong before ** Why does round tripping not work here? Because as @hselasky realized in an earlier post on this thread, 32 bit floating point is not a precise enough store to exactly convert 32 bit integers back and forth. But that's okay, because it's highly unlikely that 32 bit integer precision is needed here anyway, given as how there are no audio interfaces that are going to return more than 24 bit precision, and even that is being generous. I emphasize that above, the conversion factors used were double floats (64 bit), not single floats (32 bits), even though I passed in and returned single floats (since that's what the new buffers will use). If you try to use 32 bit float for the conversion factors, you'll run into all sorts of problems. I note with interest that PortAudio seems to use a mix of scale factors: 2147483647 and 2147483648. I haven't fully grokked (that is, understood) what they are doing. ** UPDATE: After closer examination of the PortAudio code, I have since determined that PortAudio is using the very same trick I'm using here that I discovered on my own: Express the factors as doubles, and then cast as float or int as appropriate. PortAudio also uses the two constants I gave above. Therefore, it would seem to me that this is the correct approach. One more thing. In the above, I did no byte swapping. It's not necessary here, since, if the ASIO code is running under x86 windows, (and when wouldn't it be?) so the native int's are already in little-endian form. Am I wrong about this? Why is this important? Because, if, as I suspect, a significant percentage of Jamulus users running in Windows are going to have the ASIO types returned by the use of the popular Scarlett and Behringer devices in 32 LSB form, and thus, the 32LSB class will be used a lot, and ought to be optimized as much possible. The simple implementation above does this. |
@bflamig : Sometimes 32-bit types are used just for the sake of the alignment. Yes, float means some bits will be zeroed, but I think it doesn't matter right now. We already zero the lower 16-bits .... |
Regarding byte swapping, yes we can use a packed uint32_t to get the data, but I think it shouldn't matter much CPU wise. |
Currently Jamulus uses [-1.0 to 1.0], where -1.0 and +1.0 are both included. |
(1) I corrected my post. I had some errors in the constants I used. I was looking at the wrong test program I had written. After correcting the constants, I went back and looked deeper into port audio's code, and they are using the same technique I've illustrated in the post: To wit, the use of doubles for the constant factors, and they use the same two factors I'm using. (2) I think you meant int32_t, not uint32_t. It has to be signed. (3) As far as CPU wise, that's hard to say, because using four packed bytes instead of a 32 bit integer type, you are depending on the compiler optimizing the code, and that can't be guaranteed. Using the direct int32_t type gets around that problem. It will be optimized on all compilers. (4) The -1.0 to 1.0 scale is duly noted. I had forgotten that Jamulus was already clipping to those values, like in the sliders, etc. |
Forgot to say that for sure, for the types that need endian swapping, then yes, using four packed bytes is perfectly reasonable. In these cases I tried to determine whether it is better to use the single integer and then use bit shifting to swap the bytes, or use the four packed bytes approach you had. I have come to no certain conclusions yet, but haven't spent a lot of time on it. |
Is this branch now ready for integration? Or @hselasky are you planning to adjust some things based on the feedback from bflamig? |
@corrados: I don't have any more patches pending right now. I've been using this on FreeBSD for a while. No issues seen. |
My comments were not addressed. I have since done some more testing on @hselasky's code for his sample32LSB class. I have found it to be both numerically unstable, and not optimal -- the bit shifting and byte copying are superfluous and at least one compiler does not optimize away the unnecessary actions. First, the asymmetric scale factors used are problematic. If you do a series of roundtrip tests, you'll see what should be the same values slowly shift over time. I present the following test program to illustrate this:
This code was compiled and tested using the MSVC 2019 C++ compiler. It shows how sample values can slowly drift downwards during "round tripping" (a get() followed by a put(), over and over). This may not show up audibly in real-life, but it is there, lurking, waiting to cause problems somewhere down the line. If you change the first "#if 1" to "#if 0", then the propose replacement is used. It does not exhibit this behavior. For intermediate values there may be changes at first during round-tripping, because of imprecision with 32 bit floating point, but the change takes place only once, and then the value settles there. A full scale integer value of 2147483647 converts exactly to 1.0 and back to 2147483647. Likewise for -2147483648 to -1.0 and back (though here the internal representation is off by one bit, but it settles there.) For intermediate values, the initial internal integer representation does get changed at first, by as many as 1 to 3 bits, but once there, it stays there. It does not drift. I duly repeat that the proposed replacement with its tricky double precision constants uses the exact same scaling technique as Port Audio. Second, the superfluous byte swapping and bit shifting code does not get optimized with MSVC 2019 even in Release mode. In particular, neither the bit-shifting in get() nor the byte copying in put() get optimized away. Other compilers may be different. I haven't tried them yet. The replacement code will always be optimized no matter what compiler is used because the conversions are simple multiplies. |
a mix of double and int16_t . MacOS, Linux and Android already do this, and ASIO also supports it. Change recording format to be 24-bit WAV, to get the full precision of the mixed audio! This patch gives the user the full resolution of the audio device, both when receiving audio and transmitting audio. Cleanup LSB/MSB sample processing in ASIO driver while at it. LSB and MSB indicate little-endian and big-endian data format presumably. Signed-off-by: Hans Petter Selasky <hps@selasky.org>
@corrados: I've now updated the code with your suggestions.
|
@hselasky. Cool! Now for the bad news! Oh boy! :) I don't know if you looked at your sample16LSB class much, but it suffers the same scaling issues -- even more so. However, after much investigation (I spent way more time on this than I ever dreamed of doing) I have determined there's nothing you can do about it. And no other solutions I found had any better results. Who knew these simple scaling calculations could be so finicky! I've actually been doing a survey of scaling techniques from two other libraries -- Port Audio and RtAudio, and of course the current code in Jamulus (which doesn't quite fit precisely because it doesn't do the scaling for floating point in the "same direction" because it hasn't needed to -- but the scaling that's there gives the "spirit" of what @corrados was doing. I did the survey for both the 16 bit cases and the 32 bit cases. What I've found is that none of the approaches are "perfect" and it appears they can't be, completely. Interestingly, RtAudio uses just single precision floating point, and a rounding technique that works pretty well, for both 16 bit and 32 bit integers, to float and back, but it requires calling an extra rounding function and an extra min() function. That's a lot more instructions to execute. However, even with it, I found what I think is a bug in their float to int conversion for 32 bits. It has a numeric overflow if you pass in a 1.0, just like your old code does. Wow! And that code has been around for many years. (Yes, I am using the newest version .) Of all the methods, it appears Port Audio does the best overall, but they are all fairly similar, except when they're not :) BTW: I just tested the byte moving and swapping code you had in there for the sample32LSB class using the Clang compiler. It actually optimizes in Release mode worse than MSVC 2019 -- meaning not at all. I wasn't expecting that. In your classes that actually need byte swapping, it's possible that Clang will do well, as it might recognize the patterns and substitute the single byte swap instructions. That's what some people say on the internet anyway. It's probably more true for bit shifting patterns than byte swapping patterns. When I get some more time, I'll investigate that. I'm just curious. |
I'll use an integration branch for the merge now in my repo. |
for ( i = 0; i < iServerFrameSizeSamples; i++ ) | ||
{ | ||
vecsSendData[i] = Double2Short ( vecdIntermProcBuf[i] ); | ||
vecfSendData[i] = clipFloat ( vecfIntermProcBuf[i] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more efficient to pass a reference to the vector to the clipper and clip the entire vector in the method called, rather than a call per vector entry? Or is clipFloat simple enough to get inlined?
for ( int i = 3; i < static_cast<int> ( sqrt ( static_cast<double> ( number ) ) ) + 1; i += 2 ) | ||
const int max = static_cast<int> ( sqrtf ( static_cast<float> ( number ) ) ) + 1; | ||
|
||
for ( int i = 3; i < max; i += 2 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing
} | ||
|
||
return static_cast<short> ( dInput ); | ||
return qBound ( -1.0f, fInput, 1.0f ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so simple I'd prefer it written out inline when used. It saves wondering what the method does / how it's defined, etc.
@pljones You should not review the original pull request anymore. I have merged it on my integrate_float branch and already work on fixing the code. That will take some time. Then I will create a new pull request from that branch and on that you should add your review comments. Thanks. |
Yep, got it - only spotted these when reading through to catch up with where things were and didn't want to forget. |
@bflamig I just merge the "float" code to Git master. Unfortunately, the Windows audio interface gives me a very bad clipping behaviour now :-(. Instead of clipping to the highest value it seems that it jumps from max-positive to min-negative value giving very bad clipping sound. We have to get this fixed soon, otherwise I have to go back to the original Windows audio interface and convert from int16 to float. |
@bflamig FYI, no need to hurry anymore. I undo the merge to master. The code is now again just on a branch, it's called integrate_float2. |
<mailto:reply@reply.github.com> @corrados/jamulus
I just saw this email. Hadn’t checked this email account in a while.
I’ve seen this behavior before. It has to do with converting from float to int without using double precision arithmetic, when the float value is 1.0. That’s why I had tweaked the scale factors to use double precision. I note that Port Audio does exactly this trick. I note that RtAudio does not, and it has a bug similar to this!
However, what you are seeing may not be that, but something related. If you pass a value that’s bigger than 1.0, even if slightly, you can get this behavior. Perhaps that’s what happening. Whatever it is, it appears to be only in your merged code, not in integrate_float2 – at least, the latter works for me.
I’ve seen your later comments that you aren’t going to use this code for now. Understandable. However, I wouldn’t mind trying to track this bug down. Unfortunately, my GitHub skills are a bit lacking, so I haven’t succeeded in merging integrate_float2 into your latest master code (locally of course on my own machine). When I figure that out, I’ll play around with it when I get a few moments.
Bryan
From: Volker Fischer <notifications@github.com>
Sent: Sunday, October 4, 2020 12:55 PM
To: corrados/jamulus <jamulus@noreply.github.com>
Cc: Bryan Flamig <bryan@robustcircuitdesign.com>; Mention <mention@noreply.github.com>
Subject: Re: [corrados/jamulus] Switch all Jamulus audio sample processing to use floats instead of a mix of double and int16_t (#535)
@bflamig <https://github.com/bflamig> I just merge the "float" code to Git master. Unfortunately, the Windows audio interface gives me a very bad clipping behaviour now :-(. Instead of clipping to the highest value it seems that it jumps from max-positive to min-negative value giving very bad clipping sound. We have to get this fixed soon, otherwise I have to go back to the original Windows audio interface and convert from int16 to float.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#535 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQUBA3XB6ZBMYK7SEF4INZ3SJDHKZANCNFSM4QITK7SA> .
|
No need to merge that code into master. You can use the code on integrate_float2 without having the latest changes from master. |
In that case, as I’ve mentioned. I’m not seeing any problems. Integrate_float2 works fine for me with 32 bit little endian device samples, both floating and integer, on a Windows client. (I get the float32 samples by using the Reaper “virtual” ASIO driver, and set it to use float samples). It gives me a way to test at least two different sampling types. What audio devices were you using in your tests?
I haven’t tried running the new code as a server on Windows (I was running the float server code on Linux). Perhaps that’s when the clipping bug will show up.
FYI: I’m experimenting with this code because I’m between jobs so I have some time. It gives me a chance to learn the Jamulus code in more detail. I like learning new things and seeing how others do things. Plus, we use Jamulus for our band practices right now. (I’m in a traditional folk contra-dance band, playing fiddle and percussion. I use my own custom midi hand percussion instrument that I’ve designed and built, both hardware and software. It has 6/7 voices, and I use Reaper and Addictive Drums to convert midi to audio and then send that to/from Jamulus.)
Bryan
From: Volker Fischer <notifications@github.com>
Sent: Tuesday, October 6, 2020 10:49 AM
To: corrados/jamulus <jamulus@noreply.github.com>
Cc: Bryan Flamig <bryan@robustcircuitdesign.com>; Mention <mention@noreply.github.com>
Subject: Re: [corrados/jamulus] Switch all Jamulus audio sample processing to use floats instead of a mix of double and int16_t (#535)
Unfortunately, my GitHub skills are a bit lacking, so I haven’t succeeded in merging integrate_float2 into your latest master code
No need to merge that code into master. You can use the code on integrate_float2 without having the latest changes from master.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#535 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQUBA3V7LCRTIBFSI6WLSPLSJNKAXANCNFSM4QITK7SA> .
|
MacOS, Linux and Android already do this, and ASIO also supports it.
Change recording format to be 24-bit WAV, to get the full precision
of the mixed audio!
This patch gives the user the full resolution of the audio device,
both when receiving audio and transmitting audio.
Cleanup LSB/MSB sample processing in ASIO driver while at it.
LSB and MSB indicate little-endian and big-endian data format
presumably.
Signed-off-by: Hans Petter Selasky hps@selasky.org