I have used OpenSubtitles.org English movide dataset from https://opus.nlpl.eu/OpenSubtitles-v2018.php
- build-dataset.py; This script uses IMDB to find genres of subtitle and creates id_match.csv.
- build-pickle.py; This scripts create pickle database from dataset.csv
- run-cross-validation.py; This scripts load builded pickle files and run training and cross validation and creates result.txt