Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dithering #694

Merged
merged 29 commits into from
May 26, 2021

Conversation

roderickvd
Copy link
Member

@roderickvd roderickvd commented Apr 13, 2021

Dithering lowers digital-to-analog conversion ("requantization") error, linearizing output, lowering distortion and replacing it with a constant, fixed noise level, which is more pleasant to the ear than the distortion.

Guidance:

  • On S24, S24_3 and S24, the default is to use triangular dithering. Depending on personal preference you may use Gaussian dithering instead; it's not as good objectively, but it may be preferred subjectively if you are looking for a more "analog" sound akin to tape hiss.

  • Advanced users who know that they have a DAC without noise shaping have a third option: high-passed dithering, which is like triangular dithering except that it moves dithering noise up in frequency where it is less audible. Note: 99% of DACs are of delta-sigma design with noise shaping, so unless you have a multibit / R2R DAC, or otherwise know what you are doing, this is not for you.

  • Don't dither or shape noise on S32 or F32. On F32 it's not supported anyway (there are no integer conversions and so no rounding errors) and on S32 the noise level is so far down that it is simply inaudible even after volume normalisation and control.

New command line option:

--dither DITHER Specify the dither algorithm to use - [none, gpdf,
                tpdf, tpdf_hp]. Defaults to 'tpdf' for formats S16
                S24, S24_3 and 'none' for other formats.

Notes:

This PR also features some opportunistic improvements. Worthy of mention are:

  1. matching reference Vorbis sample conversion techniques for lower noise
  2. a cleanup of the convert API

Dithering lowers digital-to-analog conversion ("requantization") error,
lowering distortion and replacing it with a constant, fixed noise
level, which is more pleasant to the ear than the distortion. Doing so
can with a noise-shaped dither can increase the dynamic range of 96 dB
CD-quality audio to a perceived 120 dB.

Guidance: experts can configure many different configurations of
ditherers and noise shapers. For the rest of us, these are sane
defaults depending on the output format:

| Format     | Ditherer         | Noise shaper             |
+------------+------------------+--------------------------+
| S16        | Triangular (tri) | Wannamaker (fw9 or fw24) |
| S24, S24_3 | High Pass (hp)   | None                     |
| S32        | None             | None                     |
| F32        | Not supported    | Not supported            |

Notes:

 * Try swapping ditherers with Gaussian if you prefer a more analog
   sound.

 * Try swapping noise shapers with Fraction Saving if you are running
   on power-constrained hardware. In this case, disable dithering.
@roderickvd
Copy link
Member Author

I'm pretty excited about this one, as I'm pretty certain it elevates the sound quality of librespot to the best in class. Please help doing some final sanity checking and code review.

@JasonLG1979
Copy link
Contributor

@roderickvd this can't be configured in the vorbis decoder itself?

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Apr 14, 2021

Looking at your noise shaping code you note that it basically shouldn't be used at all with delta-sigma DACs. Pretty much all DACs except niche NOS DACs are delta-sigma DACs.

I also wonder what additional noise shaping will do to the ultra-sonic part of the audio. (because noise shaping has already been applied in the mixing and mastering process and decoding process) A lot of even really expensive and otherwise objectively good DACs have horrible ultra-sonic filters. Ultra-sonic noise can damage audio equipment, especially class D amps and speakers.

@roderickvd
Copy link
Member Author

@roderickvd this can't be configured in the vorbis decoder itself?

No and it shouldn't. Dithering and shaping should take place right before final requantization, after volume control, normalisation and any other filtering.

Looking at your noise shaping code you note that it basically shouldn't be used at all with delta-sigma DACs.

That's the case for the Lipshitz and Wannamaker filters. The fraction saver works fine on delta-sigma. Thinking about this now I should turn off noise shaping by default, I'll get that done.

Pretty much all DACs except niche NOS DACs are delta-sigma DACs.

There is quite the uprising of (quasi-) NOS multibit DACs, e.g. Schiit, Soekris, Audio-GD.

I also wonder what additional noise shaping will do to the ultra-sonic part of the audio. (because noise shaping has already been applied in the mixing and mastering process) A lot of even really expensive and otherwise objectively good DACs have horrible ultra-sonic filters. Ultra-sonic noise can damage audio equipment, especially class D amps and speakers.

Well as you point out, it's the DAC's job to filter above Nyquist. Usually if filters are "bad", they have an early roll-off or cause modulation distortion. It's something else entirely if they fail to roll-off above Nyquist and leave ultrasonic noise intact! That's just broken. Also properly designed, non-DIY-modded DACs will have an output capacitor for low pass filtering in hardware even if the filter software is of poor design.

I think when you refer to poor filters, you are more referring to the first case of poor sound quality, rather than broken and potentially destructive behavior. Of course we can only assume the DAC is of a safe design.

@ghost
Copy link

ghost commented Apr 14, 2021

New command-line options:

--dither DITHER Override the dither algorithm to use - [none, rect, sto, tri, hp, gauss].
--shape-noise SHAPE_NOISE Override the noise shaping algorithm to use - [none, fract, iew5, iew9, fw3, fw9, fw24].

Can the description state what is the default option, i.e. what happens when we don't specify these parameters?

@JasonLG1979
Copy link
Contributor

I think when you refer to poor filters, you are more referring to the first case of poor sound quality, rather than broken and potentially destructive behavior. Of course we can only assume the DAC is of a safe design.

Have you ever checked out https://www.audiosciencereview.com/ they do A LOT of objective DAC measurements. From their measurements it's way more likely that a DAC doesn't roll off enough ultra-sonics then too much. If assumptions are to be made I would assume a poor DAC implementation over an ideal one.

@JasonLG1979
Copy link
Contributor

There is quite the uprising of (quasi-) NOS multibit DACs, e.g. Schiit, Soekris, Audio-GD.

Those are still very niche in the grand scheme of things. 99.99% of DACs out there are delta-sigma DACs. In fact most of the chips are made by just 3 manufacturers, ESS, AKM and TI and those are all delta-sigma DACs.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Apr 14, 2021

I'm just a little weary of a feature that could damage someone's hardware, even if the chance is less than microscopic. At the very least I think the noise shaping noise should be limited in spectrum to 20kHz, if it isn't already, to at least not add any ultra-sonics on it's own.

I'm somewhat talking out of my butt here so don't take anything I say as me trying to say I actually know what I'm talking about,lol!!! If you know better you know better. Just giving you some things to think about. Basically any statement I make could also be heard as a question.

@JasonLG1979
Copy link
Contributor

No and it shouldn't. Dithering and shaping should take place right before final requantization, after volume control, normalisation and any other filtering.

But it does happen during mixing, mastering and the decoding of lossy audio.

@Johannesd3
Copy link
Contributor

I'm sure you're doing a great job, but I'm not sure this is the appropriate place for such a sophisticated sound processing. As far as I understood, dithering and noise shaping are some kind of filters which take PCM, modify it, and return new PCM. Unlike volume normalization or volume control, they don't have to interact with the Spotify backend, but are statically applied, using parameters set at the start of the program. Please correct me if I'm wrong.

I also think this adds a lot of complexity to the code. Every backend was modified to implement something rather specific. One could even imagine that more kinds of filters are added to librespot, and it's not really nice to touch every backend every time, using only "macro magic" to abstract over it.

If there would be at least some more general approach, for example to create pipelines of filters, so you can modify the processing without changing all backends. But IIUC this would be very close to what gstreamer does. And maybe too much for a simple Spotify player.

Most people won't hear any difference, so it's IMO not necessary to support it out of the box. If someone really needs this, they could also invest some time in an advanced audio setup, e.g. piping the audio into some application that supports this.

@roderickvd
Copy link
Member Author

roderickvd commented Apr 14, 2021

I think when you refer to poor filters, you are more referring to the first case of poor sound quality, rather than broken and potentially destructive behavior. Of course we can only assume the DAC is of a safe design.

Have you ever checked out https://www.audiosciencereview.com/ they do A LOT of objective DAC measurements. From their measurements it's way more likely that a DAC doesn't roll off enough ultra-sonics then too much. If assumptions are to be made I would assume a poor DAC implementation over an ideal one.

Sure. As a recent example, Amir even strongly recommended one DAC that has noise shaping above 45 kHz: https://www.audiosciencereview.com/forum/index.php?threads/topping-d30pro-review-balanced-dac.20259/
Ultrasonic noise on class D becomes an issue around the switching frequency of the amplifier which is often between 200 kHz and 2 MHz.

Picking another topic off of ASR on this subject: https://www.audiosciencereview.com/forum/index.php?threads/how-does-ultrasonic-noise-affect-the-sound-quality.11558/ So while it's good that you brought this up, I think it's safe to say that we're in the green here.

Anyway, noise shaping is now off by default as of 6ea089c, which I agree with.

No and it shouldn't. Dithering and shaping should take place right before final requantization, after volume control, normalisation and any other filtering.

But it does happen during mixing, mastering and the decoding of lossy audio.

I'm not sure what you are arriving at. Yes, there is or should be dithering during mixing and mastering analog-to-digital (quantization). This does not waive dithering when going back again from digital-to-analog (requantization).

There is no dithering or libvorbis or lewton and earlier I explained why. And if you don't dither during requantization, you will have greater audible noise then when you would have dithered.

I don't think I need to expound here why dithering is a useful feature, please refer to the seminal and publicly available academic works of Lipshitz and Wannamaker or even Wikipedia.

@JasonLG1979
Copy link
Contributor

@roderickvd calm down,lol!!!

Like I said if you know you know. I was just trying to wrap my brain around what you're doing here. As I've said I was semi talking out my butt. I agree with the benefit of dithering and noise shaping especially when used with software volume and gain leveling I'm just wondering now if there really needs to be options beyond what works (safely) for resource constrained devices and what works for everything else like maybe just a "good" and "better" option that defaults to "good".

@roderickvd
Copy link
Member Author

I'm sure you're doing a great job, but I'm not sure this is the appropriate place for such a sophisticated sound processing. As far as I understood, dithering and noise shaping are some kind of filters which take PCM, modify it, and return new PCM. Unlike volume normalization or volume control, they don't have to interact with the Spotify backend, but are statically applied, using parameters set at the start of the program. Please correct me if I'm wrong.

This is correct. In the context of #648 this is a direct candidate for moving into a separate crate or daemon, which I'm in full support of but that's not today.

I also think this adds a lot of complexity to the code. Every backend was modified to implement something rather specific. One could even imagine that more kinds of filters are added to librespot, and it's not really nice to touch every backend every time, using only "macro magic" to abstract over it.

That's exactly why I propose the Requantizer class, which encapsulates all current and future behavior. Add this API now and be done with it for the future.

Most people won't hear any difference, so it's IMO not necessary to support it out of the box. If someone really needs this, they could also invest some time in an advanced audio setup, e.g. piping the audio into some application that supports this.

I say that as long as there is audio playback in librespot, we should make it the best we can, then move it out later.

@roderickvd
Copy link
Member Author

@roderickvd calm down,lol!!!

No worries mate. May I still bother you for performance testing on your RPi Zero?

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Apr 14, 2021

No worries mate. May I still bother you for performance testing on your RPi Zero?

For sure. Absolutely. I will test it out later today or tonight.

And let me say as an audio enthusiast on a budget I thank you for doing what you can to make librespot sound as good as it can. I can't wait for your work to filter down to us raspotify users.

@JasonLG1979
Copy link
Contributor

Speaking of the Pi Zero is that our potato now? Because the librespot devs really need to officially pick a potato.

( potato = lowest spec target device )

@roderickvd
Copy link
Member Author

New command-line options:

--dither DITHER Override the dither algorithm to use - [none, rect, sto, tri, hp, gauss].
--shape-noise SHAPE_NOISE Override the noise shaping algorithm to use - [none, fract, iew5, iew9, fw3, fw9, fw24].

Can the description state what is the default option, i.e. what happens when we don't specify these parameters?

Added in 00b36be.

@ghost
Copy link

ghost commented Apr 14, 2021

New command-line options:

--dither DITHER Override the dither algorithm to use - [none, rect, sto, tri, hp, gauss].
--shape-noise SHAPE_NOISE Override the noise shaping algorithm to use - [none, fract, iew5, iew9, fw3, fw9, fw24].

Can the description state what is the default option, i.e. what happens when we don't specify these parameters?

Added in 00b36be.

Thank you!

Maybe for future reference, could you please tell us why dithering is not supported for F32 and S32, i.e. does it mean that for these modes dithering is not required, i.e. these modes are the most transparent of all?

@roderickvd
Copy link
Member Author

roderickvd commented Apr 14, 2021

Maybe for future reference, could you please tell us why dithering is not supported for F32 and S32, i.e. does it mean that for these modes dithering is not required, i.e. these modes are the most transparent of all?

Dithering is supported, but completely unnecessary on S32 because any quantization noise is so far down the noise floor, that both the noise itself and any modulation distortion is next to zero. Simply inaudible.

Dithering on F32 makes no sense because floats more or less represent a continuous-valued signal and so have no quantization noise resulting from rounding to integer values.

There is one case that I should document, which is the case where your backend supports F32 or S32 but your hardware doesn't, and the backend converts down without dithering. In that case convert down in librespot, not in the backend. Example: my DAC supports up to S24 audio over I2S, and Alsa accepts up to F32. Alsa would convert down without dithering, so I run librespot with --format S24 --dither hp for optimal sound quality instead of F32 or S32.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented Apr 14, 2021

Alsa accepts up to F32. Alsa would convert down without dithering

If you run though dmix. If you directly address your DAC, trying to give it a format or sampling rate it doesn't support will just fail. Dmix by default resamples/converts everything to 16 bit 48 khz.

You can use aplay -l to find your card. and something like aplay -Dhw:0,0 --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav (replacing 0,0 with the card # and device #) to tell what sampling rates and formats your DAC actually supports.

You'll see something that looks like FORMAT: S16_LE S24_LE S32_LE and RATE: [8000 192000].

For the best quality as far as format is concerned I'd either configure dmix to the highest supported format if you need software mixing or just address the card directly if you don't and skip dmix. Either way you'd want to configure librespot to that same highest supported format.

As far as the sampling rate goes if you run though dmix you'd also want to make sure that's it's 44100 to match what librespot outputs unless you want your audio resampled for whatever reason. Resampling has questionable benefit and can be expensive CPU wise.

@JasonLG1979
Copy link
Contributor

@roderickvd Sorry I'm a little late. I'm building your branch right now on my Pi Zero with the --release --no-default-features --features "alsa-backend" compile flags the resulting binary should, if all goes well, be arm-unknown-linux-gnueabihf compatible so basically it should tun on at least any Raspberry Pi running a 32 bit version of Raspberry Pi OS. (and a few other small ARM boards?) if anyone would also like to test the build but doesn't want to compile it, let me know and I'll post the binary somewhere for you to download.

I'll unscientifically test a few different config options to get a ballpark of performance (basically how bad does it hurt the Pi Zero's CPU) compared to the current stable branch.

@roderickvd
Copy link
Member Author

Edit: Or is that just the ref "out" method?

Yes the original link was to the decoder. The encoder just takes [-1.0..=1.0] normalised samples, it's not set in stone how to arrive at that. The example encoder also uses method 1 though, just like the decoder, so it's safe to say that's what's assumed: https://github.com/xiph/vorbis/blob/master/examples/encoder_example.c#L196-L202

I pushed a fix in 68f409c. It's somewhat stray in the scope of this PR, but is easy to extract if necessary.

@JasonLG1979
Copy link
Contributor

Right on. Now I'm curious to see if there's even an audible difference,lol!!! But being technically correct is always good though none the less even if it's not audible.

@JasonLG1979
Copy link
Contributor

@roderickvd "Method 1" seems to sound at least as good or better than "Method 0" on my Pi Zero with a HIFIBERRY DAC+ ZERO DAC HAT. Not that anyone could realistically tell the difference on a $15 slave DAC with a SNR in the low 90's dB. I'm about 99.999% sure it's confirmation bias. I'll test it out on my (much better but still cheap) JDS Labs OLDAC tomorrow but I expect about the same amount of bias and certainty,lol!!!

@roderickvd
Copy link
Member Author

Same here even though my DAC has a SNR of 129 dB. It’s measurable but likely not directly audible. There is excellent evidence however that reduced harmonic distortion does reduce listening fatigue en improves perception unconsciously.

Back to the topic of command line options, I’ll come up with some alternatives later and hopefully move to final decision stage on this one.

@JasonLG1979
Copy link
Contributor

Same here even though my DAC has a SNR of 129 dB. It’s measurable but likely not directly audible. There is excellent evidence however that reduced harmonic distortion does reduce listening fatigue en improves perception unconsciously.

The guy to ask might be Amir at Audio Science Review. He has the test equipment to verify how much of an objective difference there really is. Not that he has the time or would even care,lol!!!

Back to the topic of command line options, I’ll come up with some alternatives later and hopefully move to final decision stage on this one.

Right on. I look forward to testing out whatever you come up with.

- Remove separate noise shaping module and options. It does not cater to
  99% users so does not justify the complexity.

- Rename option values to `--dither` to better match what other audio
  libraries use.

- Rework configuration so dithering is an `Option`. This saves a dynamic
  dispatch to a no-op `NoneDitherer` when dithering is set to none.

- Change default ditherer to `tpdf` instead of `tpdf_hp` which should be
  even safer on all DACs. (`tpdf_hp` is the simplest form of FIR noise
  shaping)

- Removed `rect` (`rpdf`) and `sto` ditherers, that are suboptimal to
  the others in all cases. Keeping them around is a bit academic.
@roderickvd
Copy link
Member Author

Working on this I came to my senses as I saw the huge number of cases I was trying to cater to. Instead of coming up with some extensive mapping, I really cut down the number of features 35296bf.

For starters, I took out the entire NoiseShaper module and two of the Ditherer types. Please read the commit message for more details.

What do you think now it's trimmed? All it needs now is Wiki documentation on --dither {none|gpdf|tpdf|tpdf_hp} while users that don't care or don't know are set up with the sane and safe tpdf by default.

@JasonLG1979
Copy link
Contributor

Working on this I came to my senses as I saw the huge number of cases I was trying to cater to. Instead of coming up with some extensive mapping, I really cut down the number of features 35296bf.

For starters, I took out the entire NoiseShaper module and two of the Ditherer types. Please read the commit message for more details.

What do you think now it's trimmed? All it needs now is Wiki documentation on --dither {none|gpdf|tpdf|tpdf_hp} while users that don't care or don't know are set up with the sane and safe tpdf by default.

That makes a lot more sense to me. Choice is good but to many complicated choices not so much. I like this much better.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 23, 2021

It might not be a bad idea to make a note that if your DAC supports S32 (I'm not aware of any DACs that support F32?) a user should set librespot to S32. In any event the highest supported format should always be used. That may be counter-intuitive to people who are unfamiliar with how lossy audio works and that want "bit-perfect" audio. They may assume S16 is the best format to use since the file before encoding was more than likely 16 bit.

@roderickvd
Copy link
Member Author

We can add it to the wiki.

@roderickvd roderickvd changed the title Implement dithering and noise shaping Implement dithering May 23, 2021
@JasonLG1979
Copy link
Contributor

@roderickvd Can there be a 0.2.1(?) release with among other things this and the volume fix. I think it would be a pretty solid release to tide people over until the new API is figured out.

Copy link
Contributor

@Johannesd3 Johannesd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I looked at it, it was much more confusing :)

@Johannesd3
Copy link
Contributor

Would be 0.3.0, it's all breaking.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 23, 2021

Would be 0.3.0, it's all breaking.

There would be new args but nothing would break if a user used the same args from 0.2 that I'm aware of?

@Johannesd3
Copy link
Contributor

It would/could break other applications depending on librespot as library.

@JasonLG1979
Copy link
Contributor

It would/could break other applications depending on librespot as library.

True. Whatever then make it 0.3? I guess I may be enthusiastic for the improvements but I'd like to see this out the door because I think it adds a lot of value. I'd hate to see this sit on the back burner until after the new API was figured out. I think a lot of people would really like to see (or rather hear) these changes.

@sashahilton00
Copy link
Member

This can be merged into dev whenever, the api for librespot is unstable anyway, we just try to stick with bumping the minor on breaking changes, and the patch on features/fixes atm.

@JasonLG1979
Copy link
Contributor

This can be merged into dev whenever, the api for librespot is unstable anyway, we just try to stick with bumping the minor on breaking changes, and the patch on features/fixes atm.

Bummer. I was hoping for a release so things like raspotify and the various streamer distros like Moode and Volumio would pick it up. Like I said I really think it would be a good stop gap until the new API is completely figured out. It really represents a leap forward as far as having a "reference quality" player in librespot (well as "reference quality" as you can get for a lossy format). Having it in the wild now would also help shake out any potential bugs earlier before the new API goes stable.

@sashahilton00
Copy link
Member

Bummer. I was hoping for a release so things like raspotify and the various streamer distros like Moode and Volumio would pick it up. Like I said I really think it would be a good stop gap until the new API is completely figured out. It really represents a leap forward as far as having a "reference quality" player in librespot (well as "reference quality" as you can get for a lossy format). Having it in the wild now would also help shake out any potential bugs earlier before the new API goes stable.

There's nothing to say that the gap between 0.2.0 and 0.3.0 has to be any great length of time, it just ended up that way for the last release. If we end up bumping the minor 3 times in a week then so be it, shouldn't make any difference to people using librespot, they can just pin versions as necessary.

@JasonLG1979
Copy link
Contributor

JasonLG1979 commented May 23, 2021

There's nothing to say that the gap between 0.2.0 and 0.3.0 has to be any great length of time, it just ended up that way for the last release. If we end up bumping the minor 3 times in a week then so be it, shouldn't make any difference to people using librespot, they can just pin versions as necessary.

Good to hear. I guess I just misunderstood what you said. For projects that use librespot as a monolithic binary, 0.3.x with this included over 0.2.0 would be a no-brainer.

@roderickvd
Copy link
Member Author

roderickvd commented May 23, 2021

#745 alone warrants a 0.2.1 release exactly because it is upsetting to other applications that use this as a library.

I'll push a change for @Johannesd3 review shortly, if there are any other comments, please let me know (or send a thumbs up) so I can do a final sweep.

@Johannesd3
Copy link
Contributor

This failure is ok, happens once a while

@roderickvd
Copy link
Member Author

Yeah I'll merge if the others pass.

@roderickvd roderickvd merged commit bb3dd64 into librespot-org:dev May 26, 2021
@roderickvd roderickvd deleted the dithering-and-noise-shaping branch May 26, 2021 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking includes a breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants