From 2796d2b93297fa123b5f3bb7de80a1cb00ad99a8 Mon Sep 17 00:00:00 2001 From: Yao-Yuan Yang Date: Tue, 20 Jul 2021 16:28:37 -0700 Subject: [PATCH] Update and move convention section to CONTRIBUTING.md --- CONTRIBUTING.md | 33 +++++++++++++++++++++++++++++++++ README.md | 39 --------------------------------------- 2 files changed, 33 insertions(+), 39 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 67ea6caed2..63a2c8baf9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -129,6 +129,39 @@ make html The built docs should now be available in `docs/build/html` +## Conventions + +As a good software development practice, we try to stick to existing variable +names and shape (for tensors). +The following are some of the conventions that we follow. + +- We use an ellipsis "..." as a placeholder for the rest of the dimensions of a + tensor, e.g. optional batching and channel dimensions. If batching, the + "batch" dimension should come in the first diemension. +- Tensors are assumed to have "channel" dimension coming before the "time" + dimension. The bins in frequency domain (freq and mel) are assumed to come + before the "time" dimension but after the "channel" dimension. These + ordering makes the tensors consistent with PyTorch's dimensions. +- For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, + `n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of + dimension (channel, time)") + +Here are some of the examples of commonly used variables with thier names, +meanings, and shapes (or units): + +* `waveform`: a tensor of audio samples with dimensions (..., channel, time) +* `sample_rate`: the rate of audio dimensions (samples per second) +* `specgram`: a tensor of spectrogram with dimensions (..., channel, freq, time) +* `mel_specgram`: a mel spectrogram with dimensions (..., channel, mel, time) +* `hop_length`: the number of samples between the starts of consecutive frames +* `n_fft`: the number of Fourier bins +* `n_mels`, `n_mfcc`: the number of mel and MFCC bins +* `n_freq`: the number of bins in a linear spectrogram +* `f_min`: the lowest frequency of the lowest band in a spectrogram +* `f_max`: the highest frequency of the highest band in a spectrogram +* `win_length`: the length of the STFT window +* `window_fn`: for functions that creates windows e.g. `torch.hann_window` + ## License By contributing to Torchaudio, you agree that your contributions will be licensed diff --git a/README.md b/README.md index b5b0d32f55..8e004efc72 100644 --- a/README.md +++ b/README.md @@ -138,45 +138,6 @@ API Reference API Reference is located here: http://pytorch.org/audio/ -Conventions ------------ - -With torchaudio being a machine learning library and built on top of PyTorch, -torchaudio is standardized around the following naming conventions. Tensors are -assumed to have "channel" as the first dimension and time as the last -dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions. -For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)") -whereas dimension names do not have this prefix (e.g. "a tensor of -dimension (channel, time)") - -* `waveform`: a tensor of audio samples with dimensions (channel, time) -* `sample_rate`: the rate of audio dimensions (samples per second) -* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time) -* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time) -* `hop_length`: the number of samples between the starts of consecutive frames -* `n_fft`: the number of Fourier bins -* `n_mel`, `n_mfcc`: the number of mel and MFCC bins -* `n_freq`: the number of bins in a linear spectrogram -* `min_freq`: the lowest frequency of the lowest band in a spectrogram -* `max_freq`: the highest frequency of the highest band in a spectrogram -* `win_length`: the length of the STFT window -* `window_fn`: for functions that creates windows e.g. `torch.hann_window` - -Transforms expect and return the following dimensions. - -* `Spectrogram`: (channel, time) -> (channel, freq, time) -* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time) -* `MelScale`: (channel, freq, time) -> (channel, mel, time) -* `MelSpectrogram`: (channel, time) -> (channel, mel, time) -* `MFCC`: (channel, time) -> (channel, mfcc, time) -* `MuLawEncode`: (channel, time) -> (channel, time) -* `MuLawDecode`: (channel, time) -> (channel, time) -* `Resample`: (channel, time) -> (channel, time) -* `Fade`: (channel, time) -> (channel, time) -* `Vol`: (channel, time) -> (channel, time) - -Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions. - Contributing Guidelines -----------------------