This repository contains the code and examples for our paper InverseMV: Composing Piano Scores with a Convolutional Video-Music Transformer. Video-Music Transformer (VMT) is an attention-based multi-modal model, which generates piano music for a given video.
We release a new dataset composed of over 7 hours of piano scores with fine alignment between pop music videos and MIDI files. Our complete InverseMV dataset is available here.
Here is an example video fragments from our dataset. Note that we do not do any post-production. Each file is made from the original video with a WAVE file converted from the MIDI of the model output.
The original music of the videos.
100-001_original.mp4
The music generated by our VMT model.
100-001_vmt.mp4
The musics generated by the baseline Seq2Seq model.
100-001_seq2seq.mp4
Please cite our paper if you use InverseMV in your work:
@article{lin2021inversemv,
title={InverseMV: Composing Piano Scores with a Convolutional Video-Music Transformer},
author={Lin, Chin-Tung and Yang, Mu},
journal={arXiv preprint arXiv:2112.15320},
year={2021}
}