Demo: webfft.net/acf
Image is clickable.
These images are auto-correlation spectrograms of vowels.
- Definition of auto-correlation:
ACF(X)=FFT[abs(FFT(X))^2]
. It splits inputX
intoamp*cos(freq*t+phi)
waves, drops phasesphi
and squares amplitudesamp
. For this reason,ACF(X)
is a symmetric function. - The ACF images below render
abs(ACF(X))/max(abs(ACF(X)))
to avoid oversaturation. The ACF values aren't squared and aren't log10-scaled. - Low frequencies and high frequencies are rendered with different colors by applying bandpass filters:
FFT[BPF*abs(FFT(X))^2]
. The low frequency ACF is rendered with color(12,3,1)
and the high frequency ACF - with color(1,3,12)
. Oversaturation allows to reveal more details without resorting to log10-scaling (which doesn't look good). - Sample rate: 16 kHz. Frame size: 4096. The rule of thumb: frame size = 1/4 of sample rate. This means that one ACF frame captures 1/4 sec of sound, and the frames overlap heavily.
- The waveform is padded with zeros at both ends, to avoid abrupt edges on ACF images. In complex sounds, different frequencies fade out at different pace, which gives the distinctive shape to their ACF images.
Vowel sounds below taken from the IPA table on Wikipedia. Tag "ncnfr" means near-close near-front rounded. Images are clickable.
For comparison, here is a visualization of 63 drum/kick sounds (clickable):
10/2022