Skip to content

07. Configure and Run

Felipe Batista edited this page Sep 23, 2020 · 4 revisions

Configuration

This repo contains three main scripts that perform the following tasks:

1. extract_features.py : Signature extraction Pipeline
2. generate_matches.py : Signature to Matches (saved as CSV)
3. template_matching.py: Uses source templates to query the extracted embeddings and generates a report containing potential matches

Important notebooks include (located inside the notebooks folder):

1. Visualization and Annotation Tool.ipynb: Allows the output of the generate_matches script to be reviewed and annotated.
2. Template Matching Demo.ipynb: Allows the output of the extract_features script to be queried against known videos / images [as defined in custom templates built by the user]

These scripts use the 'config.yaml' file to define where to collect data from, hyperparameters (...)

video_source_folder: Directory where the source video files are located

destination_folder: Destination of the output files generated from the scripts

root_folder_intermediate: Folder name used for the intermediate representations (Make sure it's compatible with the next paremeter)

match_distance: Distance threshold that determines whether two videos are a match [FLOAT - 0.0 to 1.0]

video_list_filename: Name of the file that contains the list of processed video files (to be saved by the extraction script)

filter_dark_videos: [true / false] Whether to remove dark videos from final output files.

filter_dark_videos_thr:[1-10 int range] Ideally a number 1 and 10. Higher numbers means we will less strict when filtering out dark videos.

min_video_duration_seconds: Minimum video duration in seconds

detect_scenes: [true / false] Whether to run scene detection or not.

use_pretrained_model_local_path: [true / false] Whether to use the pretrained model from your local file system

pretrained_model_local_path: Absolute path to pretrained model in case the user doesn't want to download it from S3

use_db: [true / false] true conninfo: Connection string (eg. postgres://[USER]:[PASSWORD]@[URL]:[PORT]/[DBNAME]). When using it using our Docker workflow, URL should default to "videodeduplication_postgres_1" instead of localhost

keep_fileoutput: [true / false]. Whether to keep regular output even with results being saved in DB

templates_source_path: Directory where templates of interest are located (should be the path to a directory where each folder contains images related to the template - eg: if set for the path datadrive/templates/, this folder could contain sub-folders like plane, smoke or bomb with its respective images on each folder)


Running

Within the docker command line

Extract video signatures

python extract_features.py

Arguments:

'--config', '-cp' : Path to the project config file [default:'config.yml']
'--list-of-files', '-lof' : path to txt with a list of files for processing - overrides source folder from the config file [default:'']
'--frame-sampling', '-fs': 'Sets the sampling strategy (values from 1 to 10 - eg sample one frame every X seconds) - overrides frame sampling from the config file' [default:1]
'--save-frames', '-sf': 'Whether to save the frames sampled from the videos - overrides save_frames on the config file'[default:False]

Generate Matches

python generate_matches.py

Template Object Matching

python template_matching.py

Exif Extraction

python extract_exif.py

Single Video Processing

python process_video.py [FILE_PATH] [OUTPUT_DIR]

Arguments:

'FILE_PATH': Path to videofile
'OUTPUT_PATH': Path where the output of running the script will be saved [default: 'data/']
'--config', '-cp' : Path to the project config file [default:'config.yml']
'--save-frames' : Whether to save video frames [default:True]
'--save-features/--no-features': Whether to save features [default=True]
'--save-signatures/--no-signatures': Whether to save features [default = True]
'--save-db' : Whether to save features [default = True]