Web-Detector — RESTful API service for object detection, based on Detectron2. Web-Detector is a test task for a SWE.
Web-detector designed to run on multiple high-performance nodes, below is an example of using docker-compose to run the web-detector on a single machine, from which you can quite easily understand what is going on.
- Nvidia GPU (it should be OK with AMD GPU or only CPU, but I'm not sure, tested only on gtx 980ti);
docker>=20.10.6
,docker-compose>=1.29.1
;nvidia-container-toolkit>=1.3.3
.
Btw, since the project depends very much on the environment and has only been tested on one machine, I will just leave it here:
- Kernel: 5.10.32-1-MANJARO
- CPU: Intel i7-5930K (12) @ 4.600GHz
- GPU: NVIDIA GeForce GTX 980 Ti
- Memory: 15918MiB
services.detector_detector.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
- Are you sure you installed
nvidia-container-toolkit>=1.3.3
? - Update docker / docker-compose;
- Comment
deploy
section ofdocker-compose.yaml
(lines 68-73);
CudeError('Unknown Error') / CudaError('Out of memory') / CudaError('...')
- Decrease
--autoscale
of workers indocker-compose.yaml
;- Set
--autoscale 1,1
in servicedetector_detector
(line 80); - Set
--autoscale 1,1
in servicedetector_processor
(line 49).
- Set
$> cp .env.example .env
;- Edit
.env
file (change passwords / secret keys).
- Mount any fs-storage to
./video_storage
; - You can easily swap celery's backend (for now it uses redis, but it's pretty expensive, so you can use mongo :) ), just remove
redis
container fromdocker-compose.yaml
and changeCELERY_BACKEND_URL
in.env
.
$> MIGRATE=true USER_ID=$UID docker-compose up --build
- Swagger spec for auth on
localhost:5000/
; - Swagger spec for API on
localhost:5000/api
;
If you will use swagger to make some requests: you should add Bearer
to generated access token. E.g. token = super-token
-> authorize in swagger using token = Bearer super-token
.
Where:
- Processor queue — RabbitMQ Celery queue named
processor
for 'processor' celery workers; - Detector queue — RabbitMQ Celery queue named
detector
for 'detector' celery workers; - Processor worker — Celery autoscalable worker, processing tasks from 'processor' queue;
- Detector worker — Celery autoscalable worker, processing tasks from 'detector' queue;
- Video storage — just a mounted directory (host fs), can be (should be) easily replaced with any mountable fs storage;
- Results storage — Redis Celery backend, can be easily replaced with any celery-supported backend (e.g. mongodb).
- Video uploads to flask app;
- Flask app generates uuid4 for video and saves it to 'video storage';
- Flask app creates new task
process_video
(in 'processor' queue) and passes uuid4 of video to it; - Processor worker takes task
process_video
from 'processor' queue and opens video (from 'video storage' by uuid4); - Processor worker splits video on frames;
- Processor worker splits frames to chunks (you can change
CHUNK_SIZE
indetector.processor.config
); - Processor worker creates chord of detector tasks
detect_frames
(in 'detector' queue) and callbackmerge_chunks
(in 'processor' queue); - Detector worker takes task
detect_frames
from 'detector' queue; - Detector worker tries to load predictor object from thread-local storage, if no predictor here — creates and caches it;
- Detector worker detect count of GPUs and start detection process not frame-by-frame, but in parallel manner on all GPUs. You can easily disable this
detector.detector.predictor
. - Detector worker detect all objects on all frames in chunk and saves result to redis (celery backend);
- When all detector tasks completed —
merge_chunks
task executes on processor worker; - Processor worker takes all per-chunk results, merges and saves them to redis (celery backend);
- Detection result can be easily obtained from redis (celery backend).
- Flask app / processor workers / dector workers use different docker images, so different environments, so can be easily scaled out;
- As far as processor/detector workers are celery-based - they have auto-sync, so we can easily increase number of nodes (even when service is live).
- Detector workers image based on nvidia-cuda and optimized for parallel run on multiple GPUs;
- As far as processor/detector workers are celery-based - they have simple autoscaling (start new workers on one node if needed).
- Redis as celery backend is pretty expensive (you can use mongo or what you want), update
.evn/CELERY_BACKEND_URL
anddocker-compose.yaml
; - Disable per-task GPU multiprocessing in
detect_frames
tasks, changedetector.detector.predictor
; - Adjust workers autoscaling, change
docker-compose.yaml
commands fordetector_processor
anddetector_detector
;
Also, you can save frames to fs storage (as videos), but this will require a little bit more changes, check detector
package.