This is based on SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices and its implementation sefr-classifier/sefr, which was originally a binary classifier. I use one-vs-rest (ovr) strategy to expand it into a working multi-class (multi-label) version.
The idea is to quickly calculate weighted averages and find hyperplanes between classes, so it needs far less computing resources/time and can actually run on-board training on low-end microcontrollers, including 16 MHz AVR processors that have as much as 2 KB ram. It would also be possible to directly re-train models on-device whenever there are new data avaliable.
Note that I made a minor change from the authors' paper: the weights and bias are calculated based on "not 0", "not 1", "not 2"...instead of 0, 1, 2. In other words, I treat "not N" as the positive label and "N" as negative label. For prediction the model would find the least possible label of "not N" (so it would most possibly to be N). I've found this way generates more accurate results in many datasets.
On the other hand, SEFR would not be super accurate when the difference between classes are not so clear. So it is more suitable for structured data or simple image patterns.
File | Usage |
---|---|
sefr_binary_visiualization.py | The original binary SEFR classifier with demostration results (the graphs you see above) |
sefr.py | A scikit-learn like Python classifier class |
sefr.ino | Arduino C++ implementation (can be run on AVRs) |
sefr.go | Golang/TinyGo implementation (can be run on any 32-bit boards supported by TinyGo) |
sefr_micrpoython.py | MicroPython implementation (ESP8266, ESP32 & Raspberry Pico) |
sefr_circuitpython_ulab.py | CircuitPython 7.0.0 implementation (for CP 7.0.0+ firmwares that has the ulab module) |
This is an example of using the famout Iris dataset (3 classes, 4 features x 150 instances):
from sefr import SEFR # import from sefr.py
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report
# load Iris dataset
# source: https://archive.ics.uci.edu/ml/datasets/iris
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'class'))
# extract data and target and convert to ndarray
X = df.drop(['class'], axis=1).to_numpy()
y = df['class'].to_numpy()
# encode labels to intergers
le = LabelEncoder()
y = le.fit_transform(y)
class_names = le.classes_ # save class names
# prepare training and test dataset
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.2, random_state=0)
# train model and predict labels
clf = SEFR()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
cv_predicted = cross_val_predict(clf, X_train, y_train, cv=5)
# view prediction results
print('Training time:', clf.training_time, 'ns')
print('Training CV score:', accuracy_score(y_train, cv_predicted).round(3))
print('Test accuracy:', accuracy_score(y_test, predicted).round(3))
print('')
print('Test classification report:')
print(classification_report(y_test, predicted, target_names=class_names))
Which generates the result below:
Training time: 0 ns
Training CV score: 0.942
Test accuracy: 0.967
Test classification report:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 11
Iris-versicolor 1.00 0.92 0.96 13
Iris-virginica 0.86 1.00 0.92 6
accuracy 0.97 30
macro avg 0.95 0.97 0.96 30
weighted avg 0.97 0.97 0.97 30
Here is another example of using the MNIST dataset (10 classes, 28x28 images, 70,000 instances):
from sefr import SEFR # import from sefr.py
from tensorflow.keras.datasets import mnist
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import accuracy_score, classification_report
# load mnist dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten images to one-dimensional
img_size = X_train.shape[1]
X_train = X_train.reshape(-1, img_size ** 2)
X_test = X_test.reshape(-1, img_size ** 2)
# train model and predict labels
clf = SEFR()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
cv_predicted = cross_val_predict(clf, X_train, y_train, cv=5)
# view prediction results
print('Training time:', clf.training_time, 'ns')
print('Training CV score:', accuracy_score(y_train, cv_predicted).round(3))
print('Test accuracy:', accuracy_score(y_test, predicted).round(3))
print('')
print('Test classification report:')
print(classification_report(y_test, predicted))
Which gets you
Training time: 1141000000 ns
Training CV score: 0.797
Test accuracy: 0.809
Test classification report:
precision recall f1-score support
0 0.96 0.76 0.85 980
1 0.94 0.88 0.91 1135
2 0.91 0.78 0.84 1032
3 0.86 0.77 0.81 1010
4 0.81 0.77 0.79 982
5 0.66 0.83 0.74 892
6 0.91 0.87 0.89 958
7 0.97 0.72 0.82 1028
8 0.60 0.86 0.70 974
9 0.69 0.85 0.76 1009
accuracy 0.81 10000
macro avg 0.83 0.81 0.81 10000
weighted avg 0.83 0.81 0.81 10000
It takes only 1.141 seconds (on my machine) to train a image recognition model with about 80% accuracy.
All the microcontroller versions have a built-in Iris dataset (some are quantized into integers to speed up calculation). They will perform training (using the whole dataset) on startup, then predict existing data added with 0~30% random noises. Below is some serial output of an Arduino Uno:
Test data: 6.10 2.70 5.20 3.00
Predicted label: 2 / actual label: 2 / (SEFR training time: 68 ms)
Test data: 6.20 3.30 1.40 0.30
Predicted label: 0 / actual label: 0 / (SEFR training time: 68 ms)
Test data: 8.30 3.30 4.00 2.30
Predicted label: 2 / actual label: 2 / (SEFR training time: 68 ms)
Test data: 3.50 2.40 2.40 1.30
Predicted label: 1 / actual label: 1 / (SEFR training time: 68 ms)
Test data: 3.60 3.80 1.60 0.20
Predicted label: 0 / actual label: 0 / (SEFR training time: 68 ms)
For the Iris dataset, it only takes 0.068 seconds to train on a 16 MHz AVR microcontroller.
The MicroPython version, with minor modifications, can also be used as a pure Python 3.4 implementation.
I've demostrated SEFR in a color recognition experiment project, which solely use a Arduino Nano and some cheap sensors to train and run the classifier. See my Hackster.io page fore details.