Welcome to the GitHub repository for the PixMus project, a comprehensive exploration of using diffusion models conditioned on video content for synthesizing background music. This repository includes my thesis, presentation slides, the curated dataset, and example output videos demonstrating the capabilities of the PixMus model.
The thesis document details the theoretical background, methodologies, experiments, and results of the PixMus model. It covers the application of diffusion models in generating music that aligns with the emotional and thematic elements of videos.
The PixMus dataset is specially curated to facilitate research in video-conditioned music generation. It consists of carefully selected video clips paired with corresponding background music, ideal for training and testing music generation models.
- Dataset Overview: Includes 53,378 samples with a mix of videos and thumbnails.
- Access the Dataset: You can access the dataset from huggingface.
Below are four output videos from the PixMus model. These videos showcase the quality and relevance of the generated background music in synchronization with the video content.
Contributions to this project are welcome, whether they involve enhancing the model, expanding the dataset, or improving the documentation. Please feel free to fork the repository, make your changes, and submit a pull request.
For any questions or further information, please contact me at [tilaksharma1114@gmail.com].
Thank you for visiting this repository, and I hope you find the resources helpful for your research or projects!