aidata is a command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.
Full documentation is available on commands at https://docs.mbari.org/internal/ai/data.
This supports loading sdcat formatted output and downloads from Tator and Redis databases, although support for other data sources is also possible, e.g. FathomNet. so we decided to keep the name generic.
This also supports loading media from a directory or URL, and transforming data into various formats for machine learning, e.g. COCO, CIFAR, or PASCAL VOC format.
- Python 3.10 or higher
- A Tator API token and Redis password for the .env file. Contact the MBARI AI team for access.
- Docker for development and testing only
Install from PyPi
pip install mbari-aidata
Create the .env file with the following contents in the root directory of the project:
TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production
Create a configuration file in the root directory of the project:
touch config_cfe.yaml
This file will be used to configure the project data, such as mounts, plugins, and database connections.
mbari_aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml
Example configuration file:
# config_cfe.yml
# Config file for CFE project production
mounts:
- name: "image"
path: "/mnt/CFElab"
host: "mantis.shore.mbari.org"
nginx_root: "/CFElab"
- name: "video"
path: "/mnt/CFElab"
host: "mantis.shore.mbari.org"
nginx_root: "/CFElab"
plugins:
- name: "extractor"
module: "mbari_aidata.plugins.extractors.tap_cfe_media"
function: "extract_media"
redis:
host: "doris.shore.mbari.org"
port: 6382
vss:
project: "902111-CFE"
model: "google/vit-base-patch16-224"
tator:
project: "902111-CFE"
host: "mantis.shore.mbari.org"
image:
attributes:
iso_datetime:
type: datetime
depth:
type: float
video:
attributes:
iso_start_datetime:
type: datetime
box:
attributes:
Label:
type: string
score:
type: float
cluster:
type: string
saliency:
type: float
area:
type: int
exemplar:
type: bool
A docker version is also available at mbari/aidata:latest
or mbari/aidata:latest:cuda-124
.
mbari_aidata download --help
- Download data, such as images, boxes, into various formats for machine learning e,g, COCO, CIFAR, or PASCAL VOC formatmbari_aidata load --help
- Load data, such as images, and boxes into either a Postgres or REDIS databasembari_aidata db --help
- Commands related to database managementmbari_aidata transform --help
- Commands related to transforming downloaded datambari_aidata -h
- Print help message and exit.
Source code is available at github.com/mbari-org/aidata.
See the Development Guide for more information on how to set up the development environment.
updated: 2025-01-28