Skip to content

Commit

Permalink
Semantic Search for Detections (#11899)
Browse files Browse the repository at this point in the history
* Initial re-implementation of semantic search

* put docker-compose back and make reindex match docs

* remove debug code and fix import

* fix docs

* manually build pysqlite3 as binaries are only available for x86-64

* update comment in build_pysqlite3.sh

* only embed objects

* better error handling when genai fails

* ask ollama to pull requested model at startup

* update ollama docs

* address some PR review comments

* fix lint

* use IPC to write description, update docs for reindex

* remove gemini-pro-vision from docs as it will be unavailable soon

* fix OpenAI doc available models

* fix api error in gemini and metadata for embeddings
  • Loading branch information
hunterjm authored and NickM-27 committed Aug 23, 2024
1 parent 65ca3c8 commit a3d7141
Show file tree
Hide file tree
Showing 48 changed files with 1,246 additions and 168 deletions.
13 changes: 13 additions & 0 deletions docker/main/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ RUN apt-get -qq update \
gfortran openexr libatlas-base-dev libssl-dev\
libtbb2 libtbb-dev libdc1394-22-dev libopenexr-dev \
libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev \
# sqlite3 dependencies
tclsh \
# scipy dependencies
gcc gfortran libopenblas-dev liblapack-dev && \
rm -rf /var/lib/apt/lists/*
Expand All @@ -161,6 +163,10 @@ RUN wget -q https://bootstrap.pypa.io/get-pip.py -O get-pip.py \
COPY docker/main/requirements.txt /requirements.txt
RUN pip3 install -r /requirements.txt

# Build pysqlite3 from source to support ChromaDB
COPY docker/main/build_pysqlite3.sh /build_pysqlite3.sh
RUN /build_pysqlite3.sh

COPY docker/main/requirements-wheels.txt /requirements-wheels.txt
RUN pip3 wheel --wheel-dir=/wheels -r /requirements-wheels.txt

Expand Down Expand Up @@ -188,6 +194,13 @@ ARG APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES="compute,video,utility"

# Turn off Chroma Telemetry: https://docs.trychroma.com/telemetry#opting-out
ENV ANONYMIZED_TELEMETRY=False
# Allow resetting the chroma database
ENV ALLOW_RESET=True
# Disable tokenizer parallelism warning
ENV TOKENIZERS_PARALLELISM=true

ENV PATH="/usr/lib/btbn-ffmpeg/bin:/usr/local/go2rtc/bin:/usr/local/tempio/bin:/usr/local/nginx/sbin:${PATH}"

# Install dependencies
Expand Down
35 changes: 35 additions & 0 deletions docker/main/build_pysqlite3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

set -euxo pipefail

SQLITE3_VERSION="96c92aba00c8375bc32fafcdf12429c58bd8aabfcadab6683e35bbb9cdebf19e" # 3.46.0
PYSQLITE3_VERSION="0.5.3"

# Fetch the source code for the latest release of Sqlite.
if [[ ! -d "sqlite" ]]; then
wget https://www.sqlite.org/src/tarball/sqlite.tar.gz?r=${SQLITE3_VERSION} -O sqlite.tar.gz
tar xzf sqlite.tar.gz
cd sqlite/
LIBS="-lm" ./configure --disable-tcl --enable-tempstore=always
make sqlite3.c
cd ../
rm sqlite.tar.gz
fi

# Grab the pysqlite3 source code.
if [[ ! -d "./pysqlite3" ]]; then
git clone https://github.com/coleifer/pysqlite3.git
fi

cd pysqlite3/
git checkout ${PYSQLITE3_VERSION}

# Copy the sqlite3 source amalgamation into the pysqlite3 directory so we can
# create a self-contained extension module.
cp "../sqlite/sqlite3.c" ./
cp "../sqlite/sqlite3.h" ./

# Create the wheel and put it in the /wheels dir.
sed -i "s|name='pysqlite3-binary'|name=PACKAGE_NAME|g" setup.py
python3 setup.py build_static
pip3 wheel . -w /wheels
7 changes: 7 additions & 0 deletions docker/main/requirements-wheels.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,10 @@ ws4py == 0.5.*
unidecode == 1.3.*
onnxruntime == 1.18.*
openvino == 2024.1.*
# Embeddings
onnx_clip == 4.0.*
chromadb == 0.5.0
# Generative AI
google-generativeai == 0.6.*
ollama == 0.2.*
openai == 1.30.*
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chroma
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chroma-pipeline
4 changes: 4 additions & 0 deletions docker/main/rootfs/etc/s6-overlay/s6-rc.d/chroma-log/run
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/command/with-contenv bash
# shellcheck shell=bash

exec logutil-service /dev/shm/logs/chroma
1 change: 1 addition & 0 deletions docker/main/rootfs/etc/s6-overlay/s6-rc.d/chroma-log/type
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
longrun
Empty file.
28 changes: 28 additions & 0 deletions docker/main/rootfs/etc/s6-overlay/s6-rc.d/chroma/finish
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/command/with-contenv bash
# shellcheck shell=bash
# Take down the S6 supervision tree when the service exits

set -o errexit -o nounset -o pipefail

# Logs should be sent to stdout so that s6 can collect them

declare exit_code_container
exit_code_container=$(cat /run/s6-linux-init-container-results/exitcode)
readonly exit_code_container
readonly exit_code_service="${1}"
readonly exit_code_signal="${2}"
readonly service="ChromaDB"

echo "[INFO] Service ${service} exited with code ${exit_code_service} (by signal ${exit_code_signal})"

if [[ "${exit_code_service}" -eq 256 ]]; then
if [[ "${exit_code_container}" -eq 0 ]]; then
echo $((128 + exit_code_signal)) >/run/s6-linux-init-container-results/exitcode
fi
elif [[ "${exit_code_service}" -ne 0 ]]; then
if [[ "${exit_code_container}" -eq 0 ]]; then
echo "${exit_code_service}" >/run/s6-linux-init-container-results/exitcode
fi
fi

exec /run/s6/basedir/bin/halt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chroma-log
16 changes: 16 additions & 0 deletions docker/main/rootfs/etc/s6-overlay/s6-rc.d/chroma/run
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/command/with-contenv bash
# shellcheck shell=bash
# Start the Frigate service

set -o errexit -o nounset -o pipefail

# Logs should be sent to stdout so that s6 can collect them

# Tell S6-Overlay not to restart this service
s6-svc -O .

echo "[INFO] Starting ChromaDB..."

# Replace the bash process with the Frigate process, redirecting stderr to stdout
exec 2>&1
exec /usr/local/chroma run --path /config/chroma --host 0.0.0.0
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
120000
1 change: 1 addition & 0 deletions docker/main/rootfs/etc/s6-overlay/s6-rc.d/chroma/type
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
longrun
Empty file.
2 changes: 1 addition & 1 deletion docker/main/rootfs/etc/s6-overlay/s6-rc.d/log-prepare/run
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

set -o errexit -o nounset -o pipefail

dirs=(/dev/shm/logs/frigate /dev/shm/logs/go2rtc /dev/shm/logs/nginx /dev/shm/logs/certsync)
dirs=(/dev/shm/logs/frigate /dev/shm/logs/go2rtc /dev/shm/logs/nginx /dev/shm/logs/certsync /dev/shm/logs/chroma)

mkdir -p "${dirs[@]}"
chown nobody:nogroup "${dirs[@]}"
Expand Down
14 changes: 14 additions & 0 deletions docker/main/rootfs/usr/local/chroma
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/python3
# -*- coding: utf-8 -*-s
__import__("pysqlite3")

import re
import sys

sys.modules["sqlite3"] = sys.modules.pop("pysqlite3")

from chromadb.cli.cli import app

if __name__ == "__main__":
sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
sys.exit(app())
135 changes: 135 additions & 0 deletions docs/docs/configuration/genai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
id: genai
title: Generative AI
---

Generative AI can be used to automatically generate descriptions based on the thumbnails of your events. This helps with [semantic search](/configuration/semantic_search) in Frigate by providing detailed text descriptions as a basis of the search query.

## Configuration

Generative AI can be enabled for all cameras or only for specific cameras. There are currently 3 providers available to integrate with Frigate.

If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.

```yaml
genai:
enabled: True
provider: gemini
api_key: "{FRIGATE_GEMINI_API_KEY}"
model: gemini-1.5-flash

cameras:
front_camera: ...
indoor_camera:
genai: # <- disable GenAI for your indoor camera
enabled: False
```
## Ollama
[Ollama](https://ollama.com/) allows you to self-host large language models and keep everything running locally. It provides a nice API over [llama.cpp](https://github.com/ggerganov/llama.cpp). It is highly recommended to host this server on a machine with an Nvidia graphics card, or on a Apple silicon Mac for best performance. Most of the 7b parameter 4-bit vision models will fit inside 8GB of VRAM. There is also a [docker container](https://hub.docker.com/r/ollama/ollama) available.
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their model library](https://ollama.com/library). At the time of writing, this includes `llava`, `llava-llama3`, `llava-phi3`, and `moondream`.

:::note

You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

:::

### Configuration

```yaml
genai:
enabled: True
provider: ollama
base_url: http://localhost:11434
model: llava
```

## Google Gemini

Google Gemini has a free tier allowing [15 queries per minute](https://ai.google.dev/pricing) to the API, which is more than sufficient for standard Frigate usage.

### Supported Models

You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://ai.google.dev/gemini-api/docs/models/gemini). At the time of writing, this includes `gemini-1.5-pro` and `gemini-1.5-flash`.

### Get API Key

To start using Gemini, you must first get an API key from [Google AI Studio](https://aistudio.google.com).

1. Accept the Terms of Service
2. Click "Get API Key" from the right hand navigation
3. Click "Create API key in new project"
4. Copy the API key for use in your config

### Configuration

```yaml
genai:
enabled: True
provider: gemini
api_key: "{FRIGATE_GEMINI_API_KEY}"
model: gemini-1.5-flash
```

## OpenAI

OpenAI does not have a free tier for their API. With the release of gpt-4o, pricing has been reduced and each generation should cost fractions of a cent if you choose to go this route.

### Supported Models

You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://platform.openai.com/docs/models). At the time of writing, this includes `gpt-4o` and `gpt-4-turbo`.

### Get API Key

To start using OpenAI, you must first [create an API key](https://platform.openai.com/api-keys) and [configure billing](https://platform.openai.com/settings/organization/billing/overview).

### Configuration

```yaml
genai:
enabled: True
provider: openai
api_key: "{FRIGATE_OPENAI_API_KEY}"
model: gpt-4o
```

## Custom Prompts

Frigate sends multiple frames from the detection along with a prompt to your Generative AI provider asking it to generate a description. The default prompt is as follows:

```
Describe the {label} in the sequence of images with as much detail as possible. Do not describe the background.
```

:::tip

Prompts can use variable replacements like `{label}`, `{sub_label}`, and `{camera}` to substitute information from the detection as part of the prompt.

:::

You are also able to define custom prompts in your configuration.

```yaml
genai:
enabled: True
provider: ollama
base_url: http://localhost:11434
model: llava
prompt: "Describe the {label} in these images from the {camera} security camera."
object_prompts:
person: "Describe the main person in these images (gender, age, clothing, activity, etc). Do not include where the activity is occurring (sidewalk, concrete, driveway, etc). If delivering a package, include the company the package is from."
car: "Label the primary vehicle in these images with just the name of the company if it is a delivery vehicle, or the color make and model."
```

### Experiment with prompts

Providers also has a public facing chat interface for their models. Download a couple different thumbnails or snapshots from Frigate and try new things in the playground to get descriptions to your liking before updating the prompt in Frigate.

- OpenAI - [ChatGPT](https://chatgpt.com)
- Gemini - [Google AI Studio](https://aistudio.google.com)
- Ollama - [Open WebUI](https://docs.openwebui.com/)
5 changes: 5 additions & 0 deletions docs/docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ go2rtc:
password: "{FRIGATE_GO2RTC_RTSP_PASSWORD}"
```

```yaml
genai:
api_key: "{FRIGATE_GENAI_API_KEY}"
```

## Common configuration examples

Here are some common starter configuration examples. Refer to the [reference config](./reference.md) for detailed information about all the config values.
Expand Down
29 changes: 29 additions & 0 deletions docs/docs/configuration/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,35 @@ snapshots:
# Optional: quality of the encoded jpeg, 0-100 (default: shown below)
quality: 70

# Optional: Configuration for semantic search capability
semantic_search:
# Optional: Enable semantic search (default: shown below)
enabled: False
# Optional: Re-index embeddings database from historical events (default: shown below)
reindex: False

# Optional: Configuration for AI generated event descriptions
# NOTE: Semantic Search must be enabled for this to do anything.
# WARNING: Depending on the provider, this will send thumbnails over the internet
# to Google or OpenAI's LLMs to generate descriptions. It can be overridden at
# the camera level (enabled: False) to enhance privacy for indoor cameras.
genai:
# Optional: Enable Google Gemini description generation (default: shown below)
enabled: False
# Required if enabled: Provider must be one of ollama, gemini, or openai
provider: ollama
# Required if provider is ollama. May also be used for an OpenAI API compatible backend with the openai provider.
base_url: http://localhost::11434
# Required if gemini or openai
api_key: "{FRIGATE_GENAI_API_KEY}"
# Optional: The default prompt for generating descriptions. Can use replacement
# variables like "label", "sub_label", "camera" to make more dynamic. (default: shown below)
prompt: "Describe the {label} in the sequence of images with as much detail as possible. Do not describe the background."
# Optional: Object specific prompts to customize description results
# Format: {label}: {prompt}
object_prompts:
person: "My special person prompt."

# Optional: Restream configuration
# Uses https://github.com/AlexxIT/go2rtc (v1.8.3)
go2rtc:
Expand Down
38 changes: 38 additions & 0 deletions docs/docs/configuration/semantic_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
id: semantic_search
title: Using Semantic Search
---

Semantic search works by embedding images and/or text into a vector representation identified by numbers. Frigate has support for two such models which both run locally: [OpenAI CLIP](https://openai.com/research/clip) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Embeddings are then saved to a local instance of [ChromaDB](https://trychroma.com).

## Configuration

Semantic Search is a global configuration setting.

```yaml
semantic_search:
enabled: True
reindex: False
```
:::tip
The embeddings database can be re-indexed from the existing detections in your database by adding `reindex: True` to your `semantic_search` configuration. Depending on the number of detections you have, it can take up to 30 minutes to complete and may max out your CPU while indexing. Make sure to set the config back to `False` before restarting Frigate again.

:::

### OpenAI CLIP

This model is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on detections to encode the thumbnail image and store it in Chroma. When searching detections via text in the search box, frigate will perform a `text -> image` similarity search against this embedding. When clicking "FIND SIMILAR" next to a detection, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails.

### all-MiniLM-L6-v2

This is a sentence embedding model that has been fine tuned on over 1 billion sentence pairs. This model is used to embed detection descriptions and perform searches against them. Descriptions can be created and/or modified on the search page when clicking on the info icon next to a detection. See [the Generative AI docs](/configuration/genai.md) for more information on how to automatically generate event descriptions.

## Usage Tips

1. Semantic search is used in conjunction with the other filters available on the search page. Use a combination of traditional filtering and semantic search for the best results.
2. The comparison between text and image embedding distances generally means that results matching `description` will appear first, even if a `thumbnail` embedding may be a better match. Play with the "Search Type" filter to help find what you are looking for.
3. Make your search language and tone closely match your descriptions. If you are using thumbnail search, phrase your query as an image caption.
4. Semantic search on thumbnails tends to return better results when matching large subjects that take up most of the frame. Small things like "cat" tend to not work well.
5. Experiment! Find a detection you want to test and start typing keywords to see what works for you.
Loading

0 comments on commit a3d7141

Please sign in to comment.