The repository includes all the scripts necessary for object classification, utilizing yoloV8
for object detection, segmentation, and classification. It integrates with RabbitMQ
to receive messages and retrieves media from the Kerberos Vault
based on the message details. Upon receiving the video, objects are detected, segmented, and classified, while the primary colors of the objects are simultaneously calculated. The frame is annotated with this information and can be saved locally. Additionally, the results are stored in a JSON object. More features are available and will be detailed in a subsequent paragraph.
To correctly install the necessary dependencies, run the following command. It is recommended to use a virtual environment for this process:
python -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
This repository offers numerous options and additional features to optimally configure it to your needs. Below is a list of all available features and their corresponding .env variable names. These variables can be modified in the included .env file.
YOLOv8 offers a range of models catering to various accuracy-performance trade-offs. Among these, yolov8n.pt
is the most performance-focused, while yolov8x.pt
emphasizes accuracy. Intermediate models such as yolov8s
, yolov8m
, and yolov8l
progressively balance performance and accuracy. The aforementioned models are classification models only. Additionally, the object classification supports segmentation models, which have similar names but include '-seg' (e.g., yolov8n-seg.pt
). Segmentation models provide the advantage of removing the background and overlapping objects for main color calculation, which will be detailed further in the color prediction feature description.
The utilised model can be altered at MODEL_NAME
.env variable.
The object classification system will automatically check for incoming messages and process them. If there is a queue build-up, it will continue to process media until the queue is empty. This functionality leverages the uugai-python-dynamic-queue
dependency. More information can be found in the corresponding GitHub repository. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.
# Initialize a message broker using the python_queue_reader package
rabbitmq = RabbitMQ(
queue_name = var.QUEUE_NAME,
target_queue_name = var.TARGET_QUEUE_NAME,
exchange = var.QUEUE_EXCHANGE,
host = var.QUEUE_HOST,
username = var.QUEUE_USERNAME,
password = var.QUEUE_PASSWORD)
# Receive a message from the queue
message = rabbitmq.receive_message()
The incoming messages provide the necessary information to retrieve media from the Kerberos Vault. The received media can then be easily written to a video file, allowing it to be used as input for the model. This functionality leverages the uugai-python-kerberos-vault
dependency. More information can be found in the corresponding GitHub repository, and additional details about Kerberos Vault itself can be found here. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.
# Initialize Kerberos Vault
kerberos_vault = KerberosVault(
storage_uri = var.STORAGE_URI,
storage_access_key = var.STORAGE_ACCESS_KEY,
storage_secret_key = var.STORAGE_SECRET_KEY)
# Retrieve media from the Kerberos Vault, in this case a video-file
resp = kerberos_vault.retrieve_media(
message = message,
media_type = 'video',
media_savepath = var.MEDIA_SAVEPATH)
The primary focus of this repository is object classification, achieved using YOLO's pretrained classification or segmentation models as described in the 'utilized model' subsection. Based on your preferences, there are configurable parameters that modify the classification process. These parameters are divided into performance-based and application-based categories. The available parameters are listed below:
MODEL_NAME
: As discussed in the 'utilized model' section, this parameter allows you to choose a model that balances performance and accuracy according to your needs. For more details, please refer to the earlier section.
CLASSIFICATION_FPS
: This parameter allows you to adjust the number of frames sent for classification. Lowering the FPS can improve performance by reducing the number of classifications required. However, setting the FPS too low may result in missing fast-moving objects and decreased tracking accuracy.
MAX_NUMBER_OF_PREDICTIONS
: This feature allows you to set a limit on the number of predictions performed, enabling you to shorten a video if desired. If no limit is needed, set this parameter to a high value.
MIN_DISTANCE
: This parameter defines the minimum distance an object must travel before it is considered 'dynamic.' The distance is calculated as the sum of the distances between centroids for each classified frame. Note that this distance can be affected by shifting bounding boxes, especially for objects that are difficult to detect.
MIN_STATIC_DISTANCE
: This parameter also defines the minimum distance an object must travel before being marked as dynamic. However, this distance is measured as the Euclidean distance between the centroids of the first and last bounding boxes. While this method is not sensitive to shifting bounding boxes, it may not detect dynamic objects that start and end in the same location.
MIN_DETECTIONS
: This parameter specifies the minimum number of times an object must be detected before it is saved in the results. This feature is useful for filtering out unwanted sporadic background detections or faulty misclassifications.
ALLOWED_CLASSIFICATIONS
: This parameter encompasses the classification model's configuration, specifying the classes to be included for detection and those to be excluded. The selection of classes is model-dependent. For the default pretrained YOLOv8 models, an 'id' and 'class' table is provided below.
ID | Class | ID | Class | ID | Class | ID | Class | ID | Class |
---|---|---|---|---|---|---|---|---|---|
0 | person | 16 | dog | 32 | sports ball | 48 | sandwich | 64 | mouse |
1 | bicycle | 17 | horse | 33 | kite | 49 | orange | 65 | remote |
2 | car | 18 | sheep | 34 | baseball bat | 50 | broccoli | 66 | keyboard |
3 | motorcycle | 19 | cow | 35 | baseball glove | 51 | carrot | 67 | cell phone |
4 | airplane | 20 | elephant | 36 | skateboard | 52 | hot dog | 68 | microwave |
5 | bus | 21 | bear | 37 | surfboard | 53 | pizza | 69 | oven |
6 | train | 22 | zebra | 38 | tennis racket | 54 | donut | 70 | toaster |
7 | truck | 23 | giraffe | 39 | bottle | 55 | cake | 71 | sink |
8 | boat | 24 | backpack | 40 | wine glass | 56 | chair | 72 | refrigerator |
9 | traffic light | 25 | umbrella | 41 | cup | 57 | couch | 73 | book |
10 | fire hydrant | 26 | handbag | 42 | fork | 58 | potted plant | 74 | clock |
11 | stop sign | 27 | tie | 43 | knife | 59 | bed | 75 | vase |
12 | parking meter | 28 | suitcase | 44 | spoon | 60 | dining table | 76 | scissors |
13 | bench | 29 | frisbee | 45 | bowl | 61 | toilet | 77 | teddy bear |
14 | bird | 30 | skis | 46 | banana | 62 | tv | 78 | hair drier |
15 | cat | 31 | snowboard | 47 | apple | 63 | laptop | 79 | toothbrush |
In most standard use-cases, the ALLOWED_CLASSIFICATIONS
parameter would conform to the following format:
ALLOWED_CLASSIFICATIONS = "0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28"
The FIND_DOMINANT_COLORS
environment variable enables the calculation of the main colors of detected objects. This feature uses the uugai-python-color-prediction
dependency to determine the primary colors. More information about its functionality and available parameters can be found in the corresponding GitHub repository
. The main colors are saved in BGR and HLS formats, and they are also mapped to a string using a slightly customized version of the HSL-79 color naming system. Additional details about this color naming system can be found here.
The choice between a classification or segmentation model significantly impacts the performance of the main color calculation. For classification models, the color calculation includes everything inside the bounding box. This object can be cropped using a feature in the uugai-python-color-prediction
dependency. However, this method does not support off-centered objects or overlapping bounding boxes. Segmentation models, on the other hand, provide the necessary mask to isolate the object from the background and exclude any overlapping objects, with only a slight decrease in performance. Depending on the video quality, downsampling can be adjusted within the function call.
The COLOR_PREDICTION_INTERVAL
environment variable allows you to adjust the interval for color prediction. Setting this variable to 1 means that the dominant colors are calculated for every frame, ensuring high accuracy. Higher integer values reduce the frequency of dominant color calculations, which increases efficiency but may decrease accuracy.
Additionally, the MIN_CLUSTERS
and MAX_CLUSTERS
environment variables allow you to adjust the number of dominant colors to be found. For example, setting MIN_CLUSTERS
to 1 and MAX_CLUSTERS
to 8 enables the function to find the optimal number of clusters using the inertias of KMeans clustering, along with an elbow point finder to identify the best fit. This method is the most accurate but requires calculating many clusters for each object.
Alternatively, setting MIN_CLUSTERS
and MAX_CLUSTERS
to the same value dictates the exact number of dominant colors to calculate. For example, setting both to 3 will find exactly 3 main clusters. This approach is more performant but may be less accurate if the actual number of dominant colors differs from the specified value.
Multiple additional features are available, each tailored to specific use-case scenarios. These encompass various verbose and saving functionalities.
In accordance with various use-case scenarios, the annotated frame can be visually represented through plotting. This functionality can be modified by adjusting the environment variable PLOT
. In situations where visual representation is unnecessary, such as when solely focusing on retrieving data without graphical output, this variable can be set to false as follows: PLOT = "False"
.
The annotated frame displays the bounding boxes of detected objects, along with their primary colors when color detection is activated. These bounding boxes are color-coded: green for dynamic objects and red for static ones. Additionally, their trajectories are plotted, accompanied by their class and confidence score.
Another option is to save the annotated video. This can be achieved by configuring the environment variable SAVE_VIDEO
to "True"
. Additionally, the save path for the video can be specified using OUTPUT_MEDIA_SAVEPATH = "path/to/your/output_video.mp4"
.
An alternative option is to generate an image containing all bounding boxes and trajectories. This process involves utilizing the initial frame of the video to draw the first bounding box of the object and its respective trajectory. However, this feature is contingent upon the minimum detection criteria specified by the MIN_DETECTIONS
parameter. Additionally, it provides insights into whether an object remained static or dynamic throughout the video duration.
The generation of this image can be enabled by setting the environment variable CREATE_BBOX_FRAME
to "True"
. Moreover, you can specify whether to save the bounding box frame and its save path using SAVE_BBOX_FRAME = "True"
and BBOX_FRAME_SAVEPATH = "path/to/your/output_bbox.jpg"
, respectively.
This parameter is typically left enabled; however, there is an option to refrain from creating a JSON data object containing all the classification data. If this repository is solely used for performing visual inspection without any subsequent post-processing, the creation of the JSON-object can be disabled using CREATE_RETURN_JSON = "False"
. Furthermore, you can customize the save path and decide whether to save this object by adjusting SAVE_RETURN_JSON = "True"
and RETURN_JSON_SAVEPATH = "path/to/your/json.json"
. The JSON-object is structered as follows:
{
"operation": "classify",
"data": {
"objectCount": int,
"properties": [str],
"details": [
{
"id": int,
"classified": str,
"distance": float,
"staticDistance": float,
"isStatic": bool,
"frameWidth": int,
"frameHeight": int,
"frame": int,
"frames": [int]
"occurence": int,
"traject": [[float]],
"trajectCentroids": [[float]],
"colorsBGR": [[[int]]],
"colorsHLS": [[[int]]],
"colorsStr": [[str]],
"colorStr": [[str, int]
"valid": true,
"w": 0,
"x": 0,
"y": 0
}
...
]}}
The final two environment variables influence the verbosity options and are split into two categories: TIME_VERBOSE
and LOGGING
.
The LOGGING
environment variable controls the output messages depending on the application's usage. The output can be one of the following:
- If no message is received from RabbitMQ:
1) Receiving message from RabbitMQ
No message received, waiting for 3 seconds
...
- If messages are being processed:
1) Receiving message from RabbitMQ
2) Retrieving media from Kerberos Vault
3) Using device: cpu
4) Opening video file: data/input/in_video.mp4
5) Classifying frames
6) Annotating bbox frame
7) Creating ReturnJSON object
- 14 objects where detected. Of which 11 objects where detected more than 5 times.
... (optional time verbose output)
8) Releasing video writer and closing video capture
The TIME_VERBOSE
environment variable includes extra time-related verbosity options, adding the following lines to the output:
- Classification took: 20.4 seconds, @ 5 fps.
- 2.05s for preprocessing and initialisation
- 18.35s for processing of which:
- 12.48s for class prediction
- 1.31s for color prediction
- 4.56s for other processing
- 0.0s for postprocessing
- Original video: 29.7 seconds, @ 25.0 fps @ 1280x720. File size of 1.2 MB
This project exists thanks to all the people who contribute.