Deep learning model in python and tensorflow trained to automatically identify cover songs. Simple 4-layer siamese convnet with an added affine layer on the output. Precision of 65% shows that the dataset can be framed this way. Data augmentation, soft attention and triplet architecture planned for future work.
Dataset is the secondhand songs set, which is a subset of the million song dataset.