TODO

Training the on/off model
Gathering your own data
Deploy visual wake word to Arduino and Sparkfun Edge

Chapter 4: Hello World

Goal: use Sine wave to power on/off LED

Steps

Obtain a simple dataset.
Train a deep learning model.
Evaluate the model's performance.
Convert the model to run on-device.
Write code to perform on-device inference.
Build the code into a binary.
Deploy the binary to a microcontroller.

Google Colaboratory -> online env for Jupyter notebooks TensorFlow: tools for building, training, evaluating, and deploying machine learning models. Keras: Tensorflow's high-level API that makes it easy to build and train deep learning networks TensorFlow Lite: set of tools for deploying TensorFlow models to mobile and embeded devices

training, evaluation, testing data
regression: model that takes an input value and uses it to predict a numeric output value
Keras: Sequential model: layer of neurons stacked on top of the next
Relu
Dense layer: fully connected; each input will be fed into every single one of it sneurons during inference
activation = activation((input * weight) + bias)
ReLU: rectified linear unit
Since ReLU is nonlinear, it allows multiple layers of neurons to join forces and model complex nonlinear relationships, in which y value doesn't increase by same amount fo revery incrmeent of x.
compile configures important arguments used in training and prepares model to be trained. model.summary(): print some summary information about architecture
size fo network (memory) mostly depends on number of parameters, meaning total number of weights and biases
Keras fit(): performs training
each run-through of dataset ==
batch: pieces of data passed to network; outputs' correctness is measured in aggregate and network's weights and biases are adjusted accordingly
batch_size: how many pieces of training data to feed into network before measuring its accruacy and updating its weights and biases. Suggestion: start with batch size of 16 or 32.
Training metrics; loss, mae, val_loss => loss: output of loss fnction => mae: mean absolute error fo training data => val_loss, val_mae (validation results)
goal: stop training when model is no longer improving, or training loss is less than validation loss (overfitting)
if model poorly approximates, easy way to improve performance is add another layer

TensorFlow Lite Converter: converts models into space-efficient format for use on memory-constrained devices, and it can apply optimizations that further reduce the model sie and make it run faster on small devices TensorFlow Lite Interpreter: runs an appropriatel converted TensorFlow Lite model using the most efficient operations for a given device

Converter: Keras model => Flatbuffer file quantization: weights and biases stored as floats; quantization allows you to store as integers

Executing TensorFlow Lite models

Instantiate an interpreter
Call some methods that allocate memory for the model
Write the input to the input tensor.
Invoke the model.
Read the output from the output tensor.

computation graph: all logic that makes up the architecture of our deep learning network.

Size of model for sine wave ==> 2 Kb, 224 less for quantized version

https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/hello_world/train/train_hello_world_model.ipynb#scrollTo=l4-WhtGpvb-E

Application code

preprocessing: transforms inputs to be compatible with the model
TF Lite interpreter: runs the model, model makes predictions on data
postprocessing: interprets the model's output and makes decisions
output handling: uses device capabilities to respond to prediction xxd: translates model into C++ tensor arena: area of memory used to store model's input, output, and intermediate tensors choosing right arena size requires trial and error
TFLiteTensor struct, provides API for tensors
Keras layers accept input as 2d tensor for scalar value
TfLitePtrUnion for data on inputs
For multi-dimensional data, add to input as flattened values
model = graph of operations which the interpreter exxecutes to transform the input data into an output
TF_LITE_MICRO_EXPECT_NEAR: use thresholding
C++ prefix constants with a k

Chapter 6

Microcontroller is connected to the circuit board using pins. GPIO pins, since they are general purpose input/output
GPIO pins are digital
Some microcontrollers also have analog digital pins
Arduino: Arduino Nano 33 BLE Sense
Pulse-width modulation used to dim LED; switch an output pin on and off extremely rapidly; the pin's output voltage becomes a factor of the ratio between time spent in the off and on states
Arduino often uses serial port for debugging information
Serial plotter can display a grpah of values it receives via serial
ErrorReporter outputs data via Aruino's serial interface
Needed to install board support package
Also needed to add my user to the dialout group
Change animation: added back and forth animation by incrementing number and outputting symbol (+ or -) in serpentine fashion. Each line is 16 characters
Change what you're driving: drove GPIO pin #2 (connected up to speaker and LED)

Chapter 7

Wake-word detection: words which trigger audio stream to speech detection model
Wake-word detection is a part of cascading architecture, where a tiny, efficient model "wakes up" a larger, more resource-hungry model.
18 KB model classifies spoken audio
model trained to recognize "yes" and "no"
application will indicate that it has detected a word by lighting an LED or displaying data on a screen

General TinyML Application architecture

Obtains an input
Preprocesses the input to extract features suitable to feed into a model
Runs inference on the processed input
Post processes the model's output to make sense of it
Uses the resulting information to make things happen.

Wake-word application architecture complexiities

Audio data as input which requires heavy processing before input.
Model is a classfiier, outputting class probabilities. We'll need to parse and make sense of output.
Designed to perform inference continually, on live data. We'll need to write code to make sense of a stream of inferences.
Model is larger and more complex. We'll push the hardware to limits of its capabilities.

Speech Commands dataset

yes, no, unknown, silent
65000 second long utterances of 30 short words
model takes sepctrograms, 2D arrays that are made up of slices of frequency information, each taken from a different time window.
convolutional neural network: works well with multidimensional tnesors in which information is contained in the relationships between groups of adjacent values
images,
Goal: display spectrogram
components

main loop

audio provider: audio data capture via microphone
remember you wan to minimize the number of ops you import to your device
feature provider: converts raw audio data into spectrogram format
TF Lite interpreter
model: data array
command recognizer: aggregate reults and determines on average whether a known word is heard
command responder: uses device's output capabilities to let user know whether command was heard

Each spectrogram == 2D array, with 40 columns and 49 rows, where each rows represents a 30-ms sample of audio split into 43 frequency buckets. 30-ms audio into fast Fourier trnasform; analyzes frequency distribution of audio in the sample and creates an array of 256 frequency buckets, each with a value from 0 to 255. These are averaged together into groups of six, leaving us with 43 buckets. (micro_features_generator)

Command recognizer uses multiple inferences to determine whether command inputs cross a given threshold.

Sparkfun Edge Setup: https://learn.sparkfun.com/tutorials/using-sparkfun-edge-board-with-ambiq-apollo3-sdk/all

Can I use wake word detection to hook up with Pynq and determine what I'm saying?

https://stackoverflow.com/a/57435184 -> sparkfun edge wakeword generally having problems? Why?

Training the wake word model

Configuring parameters
Installing the correct dependencies
Monitoring training using something called TensorBoard
Running the training script
Converting the training output into a model we can use
Training usually 5-10x faster with the GPU
cross_entropy: models loss, which quantifies how far from the correct values the models predictions are.
checkpoint files: file written by training script which contains the weights and biases produced during traning process.
Tensorflow model contains two things

weights and biases from training
a graph of operations that combine the model's input with these weights and baises to produce to model's input

https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb#scrollTo=-RHFQVLJnQTa

creating the model file => freezing: create a static representation of the graph with weights frozen into it Still a TensorFlow model to TensorFlow lite model conversion process; converter called toco => Doesn't apply to example above, skips to the C file generation step

Feature generation is main step for pre-processing input. The reasons for doing so is that the resulting data is easier to classify/differentiate. Additionally, the data size is much smaller (1960 vs 16000 values). Most models operate on processed data, but some models like DeepMind's WaveNet operate on raw data.

How does feature generation work?

mel-frequency cepstral coefficients (MFCC)
different approach here; used by google, but not published

Fast Fourier Transform for a given time slice (in our case 30 ms); data filtered with Hann window. Hann window reduces influence of samples at either end of the window.
256 frqeuency buckets
Scale down by non-linear Mel function so low frequencies have more resolution

See Trainable Frontend for Robust and Far-Field Keyword sSpotting from Yuxuan Wang et al There is a step in background noise reduction. Even and odd buckets have different coeffcients. Per-channel amplitude normalization occurs after this to boost the signal based on the running average noise.

Convolutional layers are used for spotting 2D patterns in input images. Think of convolution as moving a series of rectangular filters across the iamge, with the result at each pixel for each filter corresponding to how similar the filter is to the that patch in th eimage. Fully connected layer: weight for every value in the input tensor. Result is indication of how closely the input matches the weights, after comparing every value. Each class has its own weights, so there's an ideal pattern for "silence", "unknown", "yes", and "no" Softmax layer: effectively helps increase the difference between the highest output and its nearest competitors There is a ReLU activation function after eah layer, in addition to biases. Again, ReLU helps results converge much moreq uickly. Model runs 10 to 15 seconds (empirical choice), and then the score is averaged across time.

More sophisticated speech recognition algorithms accept a stream of input and use recurrent neural networks instead of single-layer convlutional neural networks as done here. Having streaming baked into the model means that you don't need to do the postprocessing done here.

Training your own model

need to gather thousands of audio samples in most cases
Open Speech recording app was used for Speech Commands dataset

Chapter 9: Person Detection

simpler than audio
uses CNN since they work well with adjacent data, multidimensional tensors
model uses raw pixel data, unlike audio (need to translate raw data into spectrogram)
Uses Visual Wake Words dataset
250 KB model
96 x 96 x 1 grayscale images as input
state-of-the-art image classifiers often work with 320 x 320
MobileNet architecture: well-known and battle-tested architecture for image classification on devices like mobile phones
inference takes longer due to size of input => one inference per several seconds, vs. multiple inferences per second
Keep in mind an output tensor might have more dimensions than expected; this can be due to the architecture or due to implementation.

Chapter 10: Training Wake Word Model

Training this model takes several days, 100s GBs of storage
TF Slim (older Tensorflow training interface)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
CH341SER @ f7af6b4		CH341SER @ f7af6b4
hello_world		hello_world
micro_speech		micro_speech
tensorflow @ c3c7b70		tensorflow @ c3c7b70
tract @ 35f1a1f		tract @ 35f1a1f
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
downloaded.c		downloaded.c
micro_speech_yes_no.c		micro_speech_yes_no.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TODO

Chapter 4: Hello World

Chapter 6

Chapter 7

General TinyML Application architecture

Wake-word application architecture complexiities

Training the wake word model

Training your own model

Chapter 9: Person Detection

Chapter 10: Training Wake Word Model

About

Releases

Packages

Languages

pwolf15/tinyml

Folders and files

Latest commit

History

Repository files navigation

TODO

Chapter 4: Hello World

Chapter 6

Chapter 7

General TinyML Application architecture

Wake-word application architecture complexiities

Training the wake word model

Training your own model

Chapter 9: Person Detection

Chapter 10: Training Wake Word Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages