Performance evaluation of PySceneDetect in terms of both latency and accuracy

## Problem/Use Case

Currently, PySceneDetect does not support evaluation, However, evaluating its performance is crucial for the further development. I propose a feature to integrate evaluation codes into PySceneDetect. This issue describes the procedure.

## Solutions

### Datasets
To evaluate the performance, we need datasets that consists of videos and manually-annotated shots. I investigated shot detection on Google scholar, and found that the following datasets were proposed. I think that **BCC** and **RAI** are a good starting point because these datasets are frequently used in shot detection literature and the dataset size is small, so easy to download. In addition, Kinetics-GEBD, ClipShot, and AutoShot collected videos from YouTube, thus using them for our evaluation protocols may violate YouTube's policy.

|    Dataset    |    conference    |   domain  | #videos | Avg. video length (second) | #citations |                                  Paper title                                 |
|:-------------:|:----------------:|:---------:|:-------:|:--------------------------:|:----------:|:----------------------------------------------------------------------------:|
|      BCC      |      ACMMM15     | Broadcast |    11   |            2,945           |     133    |        A deep siamese network for scene detection in broadcast videos        |
|      RAI      |      CAIP15      | Broadcast |    10   |             591            |     86     |     Shot and scene detection via clustering for re-using broadcast video     |
| Kinetics-GEBD |      ICCV21      |  General  |  55351  |             n/a            |     81     |       Generic Event Boundary Detection: A Benchmark for Event Detection      |
|    ClipShot   |      ACCV18      |  General  |   4039  |             237            |     54     |      Fast Video Shot Transition Localization with Deep Structured Models     |
|    AutoShot   | CVPR Workshop 23 |  General  |   853   |             39             |     13     | AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection |

### Metrics
The previous literature use recall, precision, and F1 scores to evaluate their methods. Let $\hat{Y}=(\hat{y}_1, \hat{y}_2, \cdots, \hat{y}_k, \cdots, \hat{y}_K)$ be predicted shot boundary frame numbers and $Y=(y_1, y_2, \cdots, y_l, \cdots, y_L)$ be the manually-annotated shot frame numbers.
Recall and precision is calculated as the following Python code:
```python
def compute_f1(hat_ys, ys):
    threshold = 5 # if abs(hat_y - y) <= threshold, the prediction is accurate
    correct = 0
    for hat_y in hat_ys:
         if min([abs(hat_y - y) for y in ys]) < threshold:
             correct += 1
    recall = correct / len(ys)
    precision = correct / len(hat_ys)
    f1 = 2 * recall * precision / (recall + precision)
```
Note that this code provides a rough overview of the evaluation process. For precise implementation details, I will need to understand edge cases (e.g., two hat_y correspond to one y, so many-to-one case).

## Implementation
I believe two evaluation modes are necessary: **local mode** and **CI mode**. 
For local mode, I created an evaluation/ directory in the home directory and wrote Python scripts to run evaluations on local laptops. 
For CI mode, based on the evaluation/ directory, we set up GitHub Actions to automatically run evaluation commands whenever new commits are pushed.

## Questions
How do we store RAI and BCC video datasets? Because the video size are larger than Github limitations (100MB), we need a storage service.
[Zenodo](https://zenodo.org/) is one of the candidate because it allows us to store datasets for academic purposes and allows us to download the datasets in a CLI-friendly manner (like curl and wget).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance evaluation of PySceneDetect in terms of both latency and accuracy #481

Problem/Use Case

Solutions

Datasets

Metrics

Implementation

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dataset	conference	domain	#videos	Avg. video length (second)	#citations	Paper title
BCC	ACMMM15	Broadcast	11	2,945	133	A deep siamese network for scene detection in broadcast videos
RAI	CAIP15	Broadcast	10	591	86	Shot and scene detection via clustering for re-using broadcast video
Kinetics-GEBD	ICCV21	General	55351	n/a	81	Generic Event Boundary Detection: A Benchmark for Event Detection
ClipShot	ACCV18	General	4039	237	54	Fast Video Shot Transition Localization with Deep Structured Models
AutoShot	CVPR Workshop 23	General	853	39	13	AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection

Performance evaluation of PySceneDetect in terms of both latency and accuracy #481

Description

Problem/Use Case

Solutions

Datasets

Metrics

Implementation

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions