We describe here the various approaches to frequency analysis of audio signals.
The most "basic" tool for frequency analysis is the Fourier transform. It describes a chunk of time-domain data as frequency-domain components. Those components are complex numbers, which can be seen as two quadratures, or amplitude and phase. Usually the spectral power is displayed, that is the square of the amplitude. The frequency scale is linear in the Fourier transform. Programming a Fourier transform is very efficient thanks to the Fast Fourier Transform (FFT) algorithm.
Unfortunately, even if the Fourier transform allows to analyze easily the data in the full scale of audible frequency (roughly 20 to 20000 Hz), it does so with a linear scale which does not correspond to our perception of frequencies. Display the spectrum of a speech on a linear scale and you will see that most of the power is concentrated on the lower end of the spectrum. Analyzing audio data on a linear frequency scale gives an unsatisfyingly low resolution for small frequencies, and wasting high resolution for large frequencies.
By trial and error, one can find that a FFT length between 1024 and 4096 is a good compromise between precision in the low frequency components and refresh time.
The alternative is to analyze the data on a nonlinear scale. This is done by a "constant-Q transform". There are some academic papers about this transform and its algorithmic complexity, but it seems that few implementations have been used in the audio world.
Instead of a formal constant-Q transform, we chose to achieve the same result by using a set of filters with a constant Q factor. This set is also described as an "octave filter-bank", or as a "fraction-of-octave filter bank". More precisely, the filter bank is a collection of band-pass filters (digital IIR elliptic filters) working in one octave, plus a decimation filter (low-pass IIR elliptic filter) to go one octave to its lower neighbor.