README.md

Benchmark

The benchmark has been realized on the FIFA dataset.

You can get the dataset with curl: curl http://www.philippe-fournier-viger.com/spmf/datasets/FIFA.txt --output FIFA.dat.

The training has been made with 20_450 sequences with an average length of 34 and an alphabet of 2990 elements.

The benchmark has been realized with a PC with 8 GB of ram, 8 cores and the Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz CPU.

The threshold_query used is 1.

With FIFA.dat in the data folder, you can run the benchmark from the benchmark folder: python benchmark.py.

Subseq predicted the entire dataset in approximatively 14 minutes, which is an average of 41 ms per prediction.

This model takes relatively more time than CPT. This is mainly because Subseq is doing a lot of Full Text search.