-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Computerized detection of sounds in real-time requires some additional complexity over doing the same task in an off-line way. In particular, latency requirements force us to work with small block sizes, and there are resource (CPU, memory) constraints that must be met in order for the algorithm to process all of the input in time.
Typically, in order to address those two issues, it makes sense to pre-allocate most or all of the memory that will be used for processing, and to pre-compute any values that need to be. Not only is memory allocation expensive if done often, it also makes the developer prone to creating memory leaks, which are very important to avoid in a process that often must run for weeks or months.
The small block size problem makes computation of detectors especially difficult, because it requires that all processing be done in a way that respects edges, and maintains continuity across them.
However, if we recognize that most detectors can be factored into a first-stage detector, followed by a classifier to reduce false positives, we can make the process somewhat easier by designing the first stage once, and reusing it.
Second stage classifiers need not handle edges in any special way, and only need to keep up with the stream of detected events, rather than the entire stream of data. Hence classifiers are (at least in principle, disregarding the algorithms themselves) simpler to write. However, working with limited resources, it is still important to separate initialization from computation as much as possible.
RADd makes this factorization explicit by providing two APIs for plugins, one for detectors, and the other for classifiers. Both APIs make the factorization of initialization and computation formal by separating them into distinct method calls. Of course, the plugin author is free to ignore these formalisms, but writing in this way makes it easier to fulfil real-time requirements.
RADd provides the common infrastructure, buffering sounds, detected events, and handles writing output to disk, and reading from the audio hardware (or filestream, for testing). The detector/classifier developer need only implement the core of the algorithm. Perhaps the easiest way to understand is to show the full API, which is only 5 core function definitions. The base API is intended to be used with functions written in C/C++, but wrappers can, and have been written for other languages.
created by Matt Robbins, Eric Spaulding and Sam Fladung of the Bioacoustic Research Program