Skip to content

aud.events

Olivier Lartillot edited this page Jan 15, 2019 · 6 revisions

Estimation of a so-called “onset” detection curve, showing the successive bursts of energy corresponding to the successive events. A peak picking is automatically performed on the onset detection curve, in order to show the estimated positions of the events (corresponding to notes, or chords, etc.).

aud.events('ragtime')

https://miningsuite.googlecode.com/svn/wiki/SigOnsets_ex1.png

From audio signal

The onset detection curve can be computed in various ways:

  • aud.events(…,'Envelope’) computes an amplitude envelope, using sig.envelope (default choice). The envelope extraction can be specified, as in sig.envelope using either ’Spectro’ or ’Filter’ option:
    • Either the 'Spectro' option (default)
      • aud.events(...,‘SpectroFrame’, fl, fh) species the frame length fl (in s.) and the hop factor fh (as a value between 0 and 1). Default values: fl = .1 s., fh = .1
      • the frequency reassigment method can be specified: ‘Freq’ (default), ‘Mel’, ‘Bark’ or ‘Cents’ (cf. mirspectrum).
      • aud.events(..., ‘PowerSpectrum’, 0) turns off the computation of the power of the spectrum.
      • aud.events(..., ‘Terhardt’) toggles on the ‘Terhardt’ operation (cf. mirspectrum).
      • aud.events(..., ‘PreSilence’) adds further frames at the beginning of the audio sequence by adding silence before the actual start of the sequence.
      • aud.events(..., ‘PostSilence’) adds further frames at the end of the audio sequence by adding silence after the actual end of the sequence.
    • or the ‘Filter’ option: Related options in mirenvelope can be specified: ‘FilterType’, ‘Tau’, ‘CutOff’, ’PreDecim’, ‘Hilbert’ with same default value than for mirenvelope.
      • aud.events(...,‘Filterbank’, N) specifies the number of channels for the filterbank decomposition (mirfilterbank): the default value being N = 40. N = 0 toggles off the filterbank decomposition.
      • aud.events(...,‘FilterbankType’, t) specifies the type of filterbank decomposition (cf. mirfilterbank).
      • aud.events(..., ‘PreSilence’) adds further silence at the beginning of the audio sequence.
      • aud.events(..., ‘PostSilence’) adds further silence at the end of the audio sequence.
    • sig.events(..., ‘Sum’, ‘off’) toggles off the channel summation (sig.sum) that is performed by default.
    • Other available options, related to sig.envelope: ‘HalfwaveCenter’, ‘Log’, ‘MinLog’, ‘Mu’, ‘Power’, ‘Diff’, ‘HalfwaveDiff’, ‘Lambda’, ‘Center’, ‘Smooth’, ‘PostDecim’, ‘Sampling’, ‘UpSample’, all with same default as in sig.envelope. In addition, sig.envelope’s ‘Normal’ option can be controlled as well, with a default set to 1.
  • sig.events(..., ‘SpectralFlux’) computes a spectral flux. Options related to mirflux can be passed here as well:
    • ‘Inc’ (toggled on by default here),
    • ‘Halfwave’ (toggled on by default here),
    • ‘Complex’ (toggled off by default as usual),
    • ‘Median’ (toggled on by default here, with same default parameters than in mirflux)

Whatever the chosen method, the detection curve is finally converted into an envelope (using mirenvelope), and further operations are performed in this order:

  • ‘Center’ (performed if ‘Center’ was specified while calling mirevents).
  • ‘Normal’ (always performed by default).

aud.events accepts as input data type either:

  • envelope curves (resulting from sig.envelope),
  • fluxes (resulting from sig.flux)
  • waveforms, which can be:
    • segmented (using sig.segment),
    • decomposed into channels (using sig.filterbank),
    • decomposed into frames or not (using sig.frame):
      • if the audio waveform is decomposed into frames, the detection curve will be based on the spectral flux;
      • if the audio waveform is not decomposed into frames, the default detection curve will be based on the envelope;
  • file name or the ‘Folder’ keyword,
  • any other object: it is decomposed into frames (if not already decomposed) using the parameters specified by the ‘Frame’ option; the flux will be automatically computed by default.

From symbolic sequence

Each event in the sequence generates a burst in the signal. By default, each burst is simply a Dirac, i.e., just one sample with high value.

  • aud.events(..., 'Gauss’, d): each burst is a Gaussian of standard deviation d, in second.

Event detection

  • aud.events(..., 'Detect’, d) specifies options related to the peak picking from the detection curve:
    • d = ‘Peaks’ (default choice): local maxima are chosen as event positions;
    • d = ‘Valleys’: local minima are chosen as event positions;
    • d = 0, or ‘no’, or ‘off’: no peak picking is performed.

Options associated to the mirpeaks function can be specified as well. In particular:

  • aud.events(..., ‘Contrast’, c) with default value here c = .01,
  • aud.events(..., ‘Threshold’, t) with default value here t = 0.
  • aud.events(..., ‘Single’) selects only the highest peak.

Attack and decay

The 'Attack' and 'Decay' options estimate the beginning and end of the attack and decay phases of each event.

  • aud.events(..., 'Attack', meth) (or 'Attacks', meth) detects attack phases using the method meth, which can be either 'Derivate' (default) or 'Effort'.
aud.events('pianoA4.wav', 'Attacks')

The 'Derivate' method works as follows:

  • A slow detection curve is computed in order to find the events, defined here as the local maxima in the curve. More precisely, for each event we only consider the local minimum preceding the local maximum: it gives a rough estimation of the onset time of the note. This curve is computed using the method chosen by the user ('Spectro' (default), 'Filter', etc.) and with other options set by default (for instance fl = .1 s., fh = .1 for the 'Spectro' method) but with 'PreSilence' and 'PostSilence' options set as specified by the user.

  • A fast detection curve is computed in order to find the first local maximum of each note, and to estimate precisely the attack phase. This is this curve that can be controlled by the options when calling aud.events, with some changes in the default parameters:

    • If the method 'Spectro' (default) is chosen, the default frame sizes are this time fl = .03 s. and fh = .02 (Nymoen et al. 2017).
    • If the chosen method is 'Filter' instead, the default 'FilterType' is set to 'Butter', the 'FilterBank' option is turned off and the 'Hilbert' transform is turned on.
  • The attack phase is searched for in the temporal region between the onset time (given by the slow curve) and the first local maximum (given by the fast curve). More precisely, in order to reject any low-amplitude peak preceding the attack phase, the search for the attack phase starts at the earliest temporal position where the amplitude of the curve is at 20% of the amplitude of the local maximum.

  • The onset time, where the attack phase begins, is set at the instant where the rate of the curve (i.e., the value of the first derivate of the curve) is o% the maximal rate of the curve in the attack phase, where o is set by default to 10%, but can be modified using the parameter 'OnsetThreshold' (expressed as a value between 0 and 1).

  • The attack time, where the attach phase ends, is set at the instant where the rate of the curve is a% the maximal rate of the curve in the attack phase, where a is set by default to 7.5%, but can be modified using the parameter 'AttackThreshold' (expressed as a value between 0 and 1).

The 'Effort' method comes from Timbre Toolbox (Peeters et al., 2011). The parameter 'Alpha', by default set to 3.75, controls the multiplicative effort ratio. To get the same results as in Timbre Toolbox, use the following options:

aud.events(…, ‘Attack’, ‘Effort’, ‘Filter’, ‘CutOff’, 5, ‘Down’, 0, ‘Alpha’, 3)
  • aud.events(..., ‘Decay’, r) (or 'Decays') detects decay phases, using the 'Derivate' method.
aud.events(‘pianoA4.wav’, ‘Decays’)
  • The decay phase is searched for in the temporal region between the last local maximum (given by the fast curve) and the offset time (given by the slow curve). More precisely, in order to reject any low-amplitude peak succeeding the decay phase, the search for the decay phase starts at the latest temporal position where the amplitude of the curve is at 20% of the amplitude of the local maximum.

  • The offset time, where the decay phase ends, is set at the instant where the rate of the curve (i.e., the value of the first derivate of the curve) is o% the maximal rate of the curve in the decay phase, where o is set by default to 10%, but can be modified using the parameter 'OffsetThreshold' (expressed as a value between 0 and 1).

  • The decay time, where the decay phase starts, is set at the instant where the rate of the curve is d% the maximal rate of the curve in the attack phase, where d is set by default to 20%, but can be modified using the parameter 'DecayThreshold' (expressed as a value between 0 and 1).

Segmentation

The temporal localization of events can be used for segmentation of the initial waveform:

o = aud.events(‘ragtime.wav’); sig.segment(‘ragtime.wav’, o)

Frame decomposition

The detection curve can be further decomposed into frames if the ‘Frame’ option has been specified, with default frame length 3 seconds and hop factor of 10% (0.3 second).

Preselected Model

Complete (or nearly complete) models are available:

  • aud.events(..., ‘Scheirer’) follows the model proposed in (Scheirer, 1998). Il corresponds to
aud.events(..., ‘Filter’, ’FilterbankType’, ‘Scheirer’, ‘FilterType’, ‘HalfHann’, ‘Sampling’, 200, ‘HalfwaveDiff’, ‘Sum’, 0, ‘Detect’, 0)

Accessible Output

Accessible using the 'get' method.

If 'Attacks' or 'Decays' options are toggled on, the output is in the aud.Envelopeclass, which adds the following output:

  • 'Onsets': the abscissae position of the starting attack phases, in sample index,
  • 'Attacks': the abscissae position of the ending attack phases, in sample index,
  • 'Decays': the abscissae position of the starting decay phases, in sample index,
  • 'Offsets': the abscissae position of the ending attack phases, in sample index,
Clone this wiki locally