Skip to content

Latest commit

 

History

History
25 lines (15 loc) · 2.75 KB

README.md

File metadata and controls

25 lines (15 loc) · 2.75 KB

Visual Morphology of Vowels

Demo: webfft.net/acf

Image is clickable.

These images are auto-correlation spectrograms of vowels.

  • Definition of auto-correlation: ACF(X)=FFT[abs(FFT(X))^2]. It splits input X into amp*cos(freq*t+phi) waves, drops phases phi and squares amplitudes amp. For this reason, ACF(X) is a symmetric function.
  • The ACF images below render abs(ACF(X))/max(abs(ACF(X))) to avoid oversaturation. The ACF values aren't squared and aren't log10-scaled.
  • Low frequencies and high frequencies are rendered with different colors by applying bandpass filters: FFT[BPF*abs(FFT(X))^2]. The low frequency ACF is rendered with color (12,3,1) and the high frequency ACF - with color (1,3,12). Oversaturation allows to reveal more details without resorting to log10-scaling (which doesn't look good).
  • Sample rate: 16 kHz. Frame size: 4096. The rule of thumb: frame size = 1/4 of sample rate. This means that one ACF frame captures 1/4 sec of sound, and the frames overlap heavily.
  • The waveform is padded with zeros at both ends, to avoid abrupt edges on ACF images. In complex sounds, different frequencies fade out at different pace, which gives the distinctive shape to their ACF images.

Vowel sounds below taken from the IPA table on Wikipedia. Tag "ncnfr" means near-close near-front rounded. Images are clickable.

For comparison, here is a visualization of 63 drum/kick sounds (clickable):

10/2022