Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float sample_rate #891

Open
vincentqb opened this issue Aug 26, 2020 · 15 comments
Open

float sample_rate #891

vincentqb opened this issue Aug 26, 2020 · 15 comments

Comments

@vincentqb
Copy link
Contributor

vincentqb commented Aug 26, 2020

torchaudio currently treats sample_rate as an integer. In #890, it was mentioned that float may be more appropriate.

  • Librairies we use for saving/loading may not support float sample_rate.
  • C++ extensions currently use int.
  • torchaudio signatures use int.

Thoughts?

@faroit
Copy link
Contributor

faroit commented Aug 26, 2020

ping @f0k @bmcfee @justinsalamon @nils-werner to get some more dsp guys on board here...

@bmcfee
Copy link

bmcfee commented Aug 26, 2020

💯 for floating point. There's no good reason to require integer sampling rates, and any interoperability requirements (eg C++ extensions) should have an explicit cast so it's clear what's going on.

@psmaragdis
Copy link

Floating point. How would you decimate 44,100 by a factor of 8 otherwise?!

@justinsalamon
Copy link

justinsalamon commented Aug 26, 2020

Looks like there's no need for further comments on my part :)

@faroit
Copy link
Contributor

faroit commented Aug 27, 2020

Looks like there's no need for further comments on my part :)

any comment by @psmaragdis should indeed taken seriously :-D

Here is a list of how some popular audio packages handle sample rate types

so the majority of c/c++ libs use int, also one more compelling argument is that the sample rate in WAV files is stored as an integer.

While I would love to use float as its physically more correct, I guess it is too far out of scope to make a change here... @vincentqb what do you think?

@vincentqb
Copy link
Contributor Author

@faroit -- just to make sure I understand: your point is that because of other libraries, int should be the used, right?

Let's say someone resamples a float 44100 hz waveform to 441000/8 hz = 5512.5 hz as suggested above. :) What would be the expected behavior when saving the signal in a file format or i/o library that only supports int? Should an error be raised because the sample rate is not convertible to int, int(5512.5) != 5512.5? The user would then have to explicitly resample to an int-valued sample rate to continue?

@faroit
Copy link
Contributor

faroit commented Aug 27, 2020

@faroit -- just to make sure I understand: your point is that because of other libraries, int should be the used, right?

i am just saying that its more common. Actually why don't we take int and floats for the resample function and use int everywhere else?

Let's say someone resamples a float 44100 hz waveform to 441000/8 hz = 5512.5 hz as suggested above. :) What would be the expected behavior when saving the signal in a file format or i/o library that only supports int? Should an error be raised because the sample rate is not convertible to int, int(5512.5) != 5512.5? The user would then have to explicitly resample to an int-valued sample rate to continue?

Interesting ;-) I had no idea, so I tried this and resampled a 44.1k file using sox and ffmpeg

sox 441.wav  -r 5512.5 out.wav
ffmpeg -i 441.wav -ar 5512.5 out.wav
  • sox didn't complain but saved the file as with 5512.0 Hz
  • ffmpeg complaint that Expected int64 for ar but found 5512.9

@psmaragdis
Copy link

I wouldn't limit this software based on what I/O libraries do. We don't use 16bit samples in our algorithms because some formats only support that, right? It shouldn't be the same for the sample rate.

Keep in mind that some audio people work at very different scales (e.g. infrasound or ultrasound). Rounding a fractional sample rate for them can completely ruin their algorithms.

@faroit textbook resampling is only defined for integer upsampling/decimation that's why the resampling function inputs are integers, they actually define the flow graph that's being used. Their ratio though can be fractional, and lead to a fractional value, e.g. 44100/8.

@psmaragdis
Copy link

Of course the real hot question is whether the sample rate should be complex valued :)

@vincentqb
Copy link
Contributor Author

offline discussion with @sw005320 who votes for integer, though I'll let him chime in if he wants to elaborate :)

@f0k
Copy link
Contributor

f0k commented Sep 6, 2020

I'm leaning towards float -- it seems a bit arbitrary to require an even number of samples in a second. If we're afraid of rounding errors, we could optionally support fractions.Fraction.

Taking this from the other side, what are the obstacles we can expect with non-integer sample rates? Are there serious problems? Would be curious to hear @sw005320's arguments!

@sw005320
Copy link

sw005320 commented Sep 7, 2020

After I read through the discussion, I agree with the other people's consensus, which is float. I was thinking that we could always cast it to float internally if needed, and we can keep the original integer representation when we load the WAV file, but it would be tricky. I think we should just use float.

@nils-werner
Copy link

nils-werner commented Sep 17, 2020

Just a sidenote, because this originated in a discussion about type hints: for type hints, int is included in float, so people can continue using ints as before, and only switch to floats if needed.

@aliutkus
Copy link
Contributor

float, and round if needed, possibly issuing a warning ?

@vincentqb
Copy link
Contributor Author

vincentqb commented Sep 25, 2020

Alrighty, I'm convinced -- we'll go ahead with float for sample rate :)

We'll start with handling sample rate on the python side.

  • as a first step, we'll take the integer part of a sample_rate before sending it to any backend for saving or C++ extensions for processing. and load sample rate as an int and convert to float afterwards.
  • we'll add a note to inform the user on top of the release notes: either a warning, or a sentence in the documentation.
  • for algorithms (e.g. resampling) that only support integer sample rates, we'll add a note as above, and take the integer part
  • for algorithms that silently were doing integer division (e.g. fix resample type hint #888, updating silent integer division #890) where it was not needed, we'll migrate explicitly to float division.
  • we'll change the python type hints to float.

Once we have that, I'd suggest a version later than the above,

  • if we decide to make all backends behave the same for the end user, my recommendation is to add a flag to the load/save function, force_float with default to False, and make backends that do not support it raise an error when True. we can revisit this at that point.

@carolineechen carolineechen mentioned this issue May 5, 2021
6 tasks
mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022
Co-authored-by: holly1238 <77758406+holly1238@users.noreply.github.com>
mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023
[FX] Add example of tracer that records module qualname for each node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants