Skip to content

Commit

Permalink
Update and move convention section to CONTRIBUTING.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yangarbiter committed Jul 28, 2021
1 parent ec3ab99 commit 15fffe3
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 39 deletions.
34 changes: 34 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,40 @@ make html

The built docs should now be available in `docs/build/html`

## Conventions

As a good software development practice, we try to stick to existing variable
names and shape (for tensors).
The following are some of the conventions that we follow.

- We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
tensor, e.g. optional batching and channel dimensions. If batching, the
"batch" dimension should come in the first diemension.
- Tensors are assumed to have "channel" dimension coming before the "time"
dimension.
- The bins in frequency domain (freq and mel) are assumed to come before the
"time" dimension but after the "channel" dimension. This makes the tensors
consistent with PyTorch's dimensions.
- For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`,
`n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")

Here are some of the examples of commonly used variables with thier names,
meanings, and shapes (or units):

* `waveform`: a tensor of audio samples with dimensions (..., channel, time)
* `sample_rate`: the rate of audio dimensions (samples per second)
* `specgram`: a tensor of spectrogram with dimensions (..., channel, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (..., channel, mel, time)
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mels`, `n_mfcc`: the number of mel and MFCC bins
* `n_freq`: the number of bins in a linear spectrogram
* `f_min`: the lowest frequency of the lowest band in a spectrogram
* `f_max`: the highest frequency of the highest band in a spectrogram
* `win_length`: the length of the STFT window
* `window_fn`: for functions that creates windows e.g. `torch.hann_window`

## License

By contributing to Torchaudio, you agree that your contributions will be licensed
Expand Down
39 changes: 0 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,45 +138,6 @@ API Reference

API Reference is located here: http://pytorch.org/audio/

Conventions
-----------

With torchaudio being a machine learning library and built on top of PyTorch,
torchaudio is standardized around the following naming conventions. Tensors are
assumed to have "channel" as the first dimension and time as the last
dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions.
For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")

* `waveform`: a tensor of audio samples with dimensions (channel, time)
* `sample_rate`: the rate of audio dimensions (samples per second)
* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
* `n_freq`: the number of bins in a linear spectrogram
* `min_freq`: the lowest frequency of the lowest band in a spectrogram
* `max_freq`: the highest frequency of the highest band in a spectrogram
* `win_length`: the length of the STFT window
* `window_fn`: for functions that creates windows e.g. `torch.hann_window`

Transforms expect and return the following dimensions.

* `Spectrogram`: (channel, time) -> (channel, freq, time)
* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
* `MelScale`: (channel, freq, time) -> (channel, mel, time)
* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
* `MFCC`: (channel, time) -> (channel, mfcc, time)
* `MuLawEncode`: (channel, time) -> (channel, time)
* `MuLawDecode`: (channel, time) -> (channel, time)
* `Resample`: (channel, time) -> (channel, time)
* `Fade`: (channel, time) -> (channel, time)
* `Vol`: (channel, time) -> (channel, time)

Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.

Contributing Guidelines
-----------------------

Expand Down

0 comments on commit 15fffe3

Please sign in to comment.