This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, and image classification on ESP32 devices.
I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work.
FaceDetector: Detects faces in images and provides bounding boxes and facial featuresFaceRecognizer: Recognizes enrolled faces and manages a face databaseHumanDetector: Detects people in images and provides bounding boxesImageNet: Classifies images into predefined categories
You can find precompiled images in two ways:
- In the Actions section for passed workflows under artifacts
- By forking the repo and manually starting the action
- Clone the required repositories:
git clone https://github.com/cnadler86/mp_esp_dl_models.git
git clone https://github.com/cnadler86/micropython-camera-API.git
git clone https://github.com/cnadler86/mp_jpeg.git- Build the firmware: Make sure you have the complete ESP32 build environment for MicroPython available.
cd boards/
idf.py -D MICROPY_DIR=<micropython-dir> -D MICROPY_BOARD=<BOARD_NAME> -D MICROPY_BOARD_VARIANT=<BOARD_VARIANT> -B build-<your-build-name> build
cd build-<your-build-name>
python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2All models require input images in RGB888 format. You can use mp_jpeg to decode camera images to the correct format.
The FaceDetector module detects faces in images and can optionally provide facial feature points.
FaceDetector(width=320, height=240, features=True)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240features(bool, optional): Whether to return facial feature points. Default: True
-
run(framebuffer)
Detects faces in the provided image.
Parameters:
framebuffer: RGB888 image data (required)
Returns: List of dictionaries with detection results, each containing:
score: Detection confidence (float)box: Bounding box coordinates [x1, y1, x2, y2]features: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise
The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.
FaceRecognizer(width=320, height=240, db_path="face.db")Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240db_path(str, optional): Path to the face database file. Default: "face.db"
-
run(framebuffer)
Detects and recognizes faces in the provided image.
Parameters:
framebuffer: RGB888 image data (required)
Returns: List of dictionaries with recognition results, each containing:
score: Detection confidencebox: Bounding box coordinates [x1, y1, x2, y2]features: Facial feature points (if enabled)person: Recognition result containing:id: Face IDsimilarity: Match confidence (0-1)name: Person name (if provided during enrollment)
-
enroll(framebuffer, validate=False, name=None)
Enrolls a new face in the database.
Parameters:
framebuffer: RGB888 image datavalidate(bool, optional): Check if face is already enrolled. Default: Falsename(str, optional): Name to associate with the face. Default: None
Returns:
- ID of the enrolled face
-
delete_face(id)
Deletes a face from the database.
Parameters:
id(int): ID of the face to delete
-
print_database()
Prints the contents of the face database.
The HumanDetector module detects people in images.
HumanDetector(width=320, height=240)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240
-
run(framebuffer)
Detects people in the provided image.
Parameters:
framebuffer: RGB888 image data
Returns: List of dictionaries with detection results, each containing:
score: Detection confidencebox: Bounding box coordinates [x1, y1, x2, y2]
The ImageNet module classifies images into predefined categories.
ImageNet(width=320, height=240)Parameters:
width(int, optional): Input image width. Default: 320height(int, optional): Input image height. Default: 240
-
run(framebuffer)
Classifies the provided image.
Parameters:
framebuffer: RGB888 image data
Returns: List alternating between class names and confidence scores:
[class1, score1, class2, score2, ...]
from espdl import FaceDetector
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder()
face_detector = FaceDetector()
# Capture and process image
img = cam.capture()
framebuffer = decoder.decode(img) # Convert to RGB888
results = face_detector.run(framebuffer)
if results:
for face in results:
print(f"Face detected with confidence: {face['score']}")
print(f"Bounding box: {face['box']}")
if face['features']:
print(f"Facial features: {face['features']}")from espdl import FaceRecognizer
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder()
recognizer = FaceRecognizer(db_path="/faces.db")
# Enroll a face
img = cam.capture()
framebuffer = decoder.decode(img)
face_id = recognizer.enroll(framebuffer, name="John")
print(f"Enrolled face with ID: {face_id}")
# Later, recognize faces
img = cam.capture()
framebuffer = decoder.decode(img)
results = recognizer.run(framebuffer)
if results:
for face in results:
if face['person']:
print(f"Recognized {face['person']['name']} (ID: {face['person']['id']})")
print(f"Similarity: {face['person']['similarity']}")The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.
| Frame Size | FaceDetector | HumanDetector |
|---|---|---|
| QQVGA | 14.5 | 6.6 |
| R128x128 | 21 | 6.6 |
| QCIF | 19.7 | 6.5 |
| HQVGA | 18 | 6.3 |
| R240X240 | 16.7 | 6.1 |
| QVGA | 15.2 | 6.6 |
| CIF | 13 | 5.5 |
| HVGA | 11.9 | 5.3 |
| VGA | 8.2 | 4.4 |
| SVGA | 6.2 | 3.8 |
| XGA | 4.1 | 2.8 |
| HD | 3.6 | 2.6 |
-
Image Format: Always ensure input images are in RGB888 format. Use mp_jpeg for JPEG decoding from camera.
-
Memory Management:
- Close/delete detector objects when no longer needed
- Consider memory constraints when choosing image dimensions
-
Face Recognition:
- Enroll faces in good lighting conditions
- Multiple enrollments of the same person can improve recognition
- Use
validate=Trueduring enrollment to avoid duplicates
-
Storage:
- Face database is persistent across reboots
- Consider backing up the face database file