Update and move convention section to CONTRIBUTING.md

pytorch · Jul 28, 2021 · 15fffe3 · 15fffe3
1 parent ec3ab99
commit 15fffe3
Show file tree

Hide file tree

Showing 2 changed files with 34 additions and 39 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -129,6 +129,40 @@ make html
 
 The built docs should now be available in `docs/build/html`
 
+## Conventions
+
+As a good software development practice, we try to stick to existing variable
+names and shape (for tensors).
+The following are some of the conventions that we follow.
+
+- We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
+  tensor, e.g. optional batching and channel dimensions. If batching, the
+  "batch" dimension should come in the first diemension.
+- Tensors are assumed to have "channel" dimension coming before the "time"
+  dimension.
+- The bins in frequency domain (freq and mel) are assumed to come before the
+  "time" dimension but after the "channel" dimension. This makes the tensors
+  consistent with PyTorch's dimensions.
+- For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`,
+  `n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of
+  dimension (channel, time)")
+
+Here are some of the examples of commonly used variables with thier names,
+meanings, and shapes (or units):
+
+* `waveform`: a tensor of audio samples with dimensions (..., channel, time)
+* `sample_rate`: the rate of audio dimensions (samples per second)
+* `specgram`: a tensor of spectrogram with dimensions (..., channel, freq, time)
+* `mel_specgram`: a mel spectrogram with dimensions (..., channel, mel, time)
+* `hop_length`: the number of samples between the starts of consecutive frames
+* `n_fft`: the number of Fourier bins
+* `n_mels`, `n_mfcc`: the number of mel and MFCC bins
+* `n_freq`: the number of bins in a linear spectrogram
+* `f_min`: the lowest frequency of the lowest band in a spectrogram
+* `f_max`: the highest frequency of the highest band in a spectrogram
+* `win_length`: the length of the STFT window
+* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
+
 ## License
 
 By contributing to Torchaudio, you agree that your contributions will be licensed

diff --git a/README.md b/README.md
@@ -138,45 +138,6 @@ API Reference
 
 API Reference is located here: http://pytorch.org/audio/
 
-Conventions
------------
-
-With torchaudio being a machine learning library and built on top of PyTorch,
-torchaudio is standardized around the following naming conventions. Tensors are
-assumed to have "channel" as the first dimension and time as the last
-dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions.
-For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
-whereas dimension names do not have this prefix (e.g. "a tensor of
-dimension (channel, time)")
-
-* `waveform`: a tensor of audio samples with dimensions (channel, time)
-* `sample_rate`: the rate of audio dimensions (samples per second)
-* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
-* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
-* `hop_length`: the number of samples between the starts of consecutive frames
-* `n_fft`: the number of Fourier bins
-* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
-* `n_freq`: the number of bins in a linear spectrogram
-* `min_freq`: the lowest frequency of the lowest band in a spectrogram
-* `max_freq`: the highest frequency of the highest band in a spectrogram
-* `win_length`: the length of the STFT window
-* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
-
-Transforms expect and return the following dimensions.
-
-* `Spectrogram`: (channel, time) -> (channel, freq, time)
-* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
-* `MelScale`: (channel, freq, time) -> (channel, mel, time)
-* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
-* `MFCC`: (channel, time) -> (channel, mfcc, time)
-* `MuLawEncode`: (channel, time) -> (channel, time)
-* `MuLawDecode`: (channel, time) -> (channel, time)
-* `Resample`: (channel, time) -> (channel, time)
-* `Fade`: (channel, time) -> (channel, time)
-* `Vol`: (channel, time) -> (channel, time)
-
-Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
-
 Contributing Guidelines
 -----------------------