-
Notifications
You must be signed in to change notification settings - Fork 12
07. Configure and Run
This repo contains three main scripts that perform the following tasks:
1. extract_features.py : Signature extraction Pipeline
2. generate_matches.py : Signature to Matches (saved as CSV)
3. template_matching.py: Uses source templates to query the extracted embeddings and generates a report containing potential matches
Important notebooks include (located inside the notebooks folder):
1. Visualization and Annotation Tool.ipynb: Allows the output of the generate_matches script to be reviewed and annotated.
2. Template Matching Demo.ipynb: Allows the output of the extract_features script to be queried against known videos / images [as defined in custom templates built by the user]
These scripts use the 'config.yaml' file to define where to collect data from, hyperparameters (...)
video_source_folder: Directory where the source video files are located
destination_folder: Destination of the output files generated from the scripts
root_folder_intermediate: Folder name used for the intermediate representations (Make sure it's compatible with the next paremeter)
match_distance: Distance threshold that determines whether two videos are a match [FLOAT - 0.0 to 1.0]
video_list_filename: Name of the file that contains the list of processed video files (to be saved by the extraction script)
filter_dark_videos: [true / false] Whether to remove dark videos from final output files.
filter_dark_videos_thr:[1-10 int range] Ideally a number 1 and 10. Higher numbers means we will less strict when filtering out dark videos.
min_video_duration_seconds: Minimum video duration in seconds
detect_scenes: [true / false] Whether to run scene detection or not.
use_pretrained_model_local_path: [true / false] Whether to use the pretrained model from your local file system
pretrained_model_local_path: Absolute path to pretrained model in case the user doesn't want to download it from S3
use_db: [true / false] true conninfo: Connection string (eg. postgres://[USER]:[PASSWORD]@[URL]:[PORT]/[DBNAME]). When using it using our Docker workflow, URL should default to "videodeduplication_postgres_1" instead of localhost
keep_fileoutput: [true / false]. Whether to keep regular output even with results being saved in DB
templates_source_path: Directory where templates of interest are located (should be the path to a directory where each folder contains images related to the template - eg: if set for the path datadrive/templates/, this folder could contain sub-folders like plane, smoke or bomb with its respective images on each folder)
Within the docker command line
Extract video signatures
python extract_features.py
Arguments:
'--config', '-cp' : Path to the project config file [default:'config.yml']
'--list-of-files', '-lof' : path to txt with a list of files for processing - overrides source folder from the config file [default:'']
'--frame-sampling', '-fs': 'Sets the sampling strategy (values from 1 to 10 - eg sample one frame every X seconds) - overrides frame sampling from the config file' [default:1]
'--save-frames', '-sf': 'Whether to save the frames sampled from the videos - overrides save_frames on the config file'[default:False]
Generate Matches
python generate_matches.py
Template Object Matching
python template_matching.py
Exif Extraction
python extract_exif.py
Single Video Processing
python process_video.py [FILE_PATH] [OUTPUT_DIR]
Arguments:
'FILE_PATH': Path to videofile
'OUTPUT_PATH': Path where the output of running the script will be saved [default: 'data/']
'--config', '-cp' : Path to the project config file [default:'config.yml']
'--save-frames' : Whether to save video frames [default:True]
'--save-features/--no-features': Whether to save features [default=True]
'--save-signatures/--no-signatures': Whether to save features [default = True]
'--save-db' : Whether to save features [default = True]