Skip to content

sig.segment

Olivier Lartillot edited this page Feb 8, 2018 · 4 revisions

Segmentation based on provided time positions

  • A signal a can be segmented manually, based on temporal position directly given by the user, in the form:
sg=sig.segment(a,v)

where v is an array of numbers corresponding to time positions in seconds.

  • A signal a can be segmented using the output p of a peak picking from data resulting from a itself, using the following syntax:
sg=sig.segment(a,p)

If p is a frame-decomposed scalar curve, the audio waveform a will be segmented at the middle of each frame containing a peak.

  • Automated segmentation methods are provided as well, that can be called using the syntax:
sg = sig.segment(a, m)

where m is the name of one of the following segmentation methods: ‘Novelty’ (default, cf. sig.novelty), ‘HCDF’ (cf. sig.hcdf) or ‘RMS’ (cf. sig.rms).

sig.segment accepts uniquely as main input a sig.Signal objects not frame-decomposed, not channel decomposed, and not already segmented. Alternatively, file name or the 'Folder' keyword can be used as well. The first argument of the sig.segment function is the audio file that needs to be segmented. It is possible for instance to compute the segmentation curve using a downsampled version of the signal and to perform the actual segmentation using the original audio file.

Segmentation based on a particular method

sig.segment(..., 'Novelty') (default method)

Peak detection applied to the novelty curve (sig.novelty) returns the temporal position of feature discontinuities that can be used for the actual segmentation of the audio sequence. Some parameters related to sig.novelty are accessible in sig.segment: 'Distance', 'Measure' and 'KernelSize'. Some parameters related to sig.peaks are accessible in sig.segment: 'Total' (set by default to Inf) and 'Contrast' (set to .1). By default the novelty is computed from MFCC (aud.mfcc) where the 'Rank' parameter can be specified. The default frame size is 50 ms and no overlapping.

sig.segment(..., 'RMS')

Segmentation at positions of long silences. A frame decomposed RMS is computed using sig.rms (with default options), and segments are selected from temporal positions where the RMS rises to a given 'On' threshold, until temporal positions where the RMS drops back to a given 'Off' threshold.

Options

  • sig.segment(...,‘Off’, t1) specifies the RMS ‘Off’ threshold. Default value: t1 = .01

  • sig.segment(...,‘On’, t2) specifies the RMS ‘On’ threshold. Default value: t2 = .02

Further analysis of segmented waveform

The output can be sent to any further analysis, for instance:

sp = sig.spectrum(sg, ‘dB’)
Clone this wiki locally