Skip to content

okio-ai/lp-music-caps

This branch is 1 commit ahead of seungheondoh/lp-music-caps:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
faradox
Jan 13, 2024
ece7fa8 Â· Jan 13, 2024

History

32 Commits
Jul 31, 2023
Jul 12, 2023
Aug 21, 2023
Jul 31, 2023
Jul 12, 2023
Dec 14, 2023
Jan 13, 2024

Repository files navigation

🔉 LP-MusicCaps: LLM-Based Pseudo Music Captioning

Demo Video

This is a implementation of LP-MusicCaps: LLM-Based Pseudo Music Captioning. This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning

LP-MusicCaps: LLM-Based Pseudo Music Captioning

SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam
To appear ISMIR 2023

News

  • 23.12.12 Our paper has been invited to the TISMIR journal. Stay tuned for the extended version
  • 23.11.10 Our paper has been nominated for the ISMIR Best Paper Award (5/104)

TL;DR

Open Source Material

are available online for future research. example of dataset in notebook

Installation

To run this project locally, follow the steps below:

  1. Install python and PyTorch:

    • python==3.10
    • torch==1.13.1 (Please install it according to your CUDA version.)
  2. Other requirements:

    • pip install -e .

Quick Start: Tag to Caption

cd lpmc/llm_captioning
python run.py --prompt {writing, summary, paraphrase, attribute_prediction} --tags <music_tags>

Replace <music_tags> with the tags you want to generate captions for. Separate multiple tags with commas, such as beatbox, finger snipping, male voice, amateur recording, medium tempo.

tag_to_caption generation writing results:

query: 
write a song description sentence including the following attributes
beatbox, finger snipping, male voice, amateur recording, medium tempo
----------
results: 
"Experience the raw and authentic energy of an amateur recording as mesmerizing beatbox rhythms intertwine with catchy finger snipping, while a soulful male voice delivers heartfelt lyrics on a medium tempo track."

Quick Start: Audio to Caption

cd demo
python app.py

# or
cd lpmc/music_captioning
wget https://huggingface.co/seungheondoh/lp-music-caps/resolve/main/transfer.pth -O exp/transfer/lp_music_caps
python captioning.py --audio_path ../../dataset/samples/orchestra.wav
{'text': "This is a symphonic orchestra playing  a piece that's riveting, thrilling and exciting. 
The peace would be suitable in a movie when something grand and impressive happens. 
There are clarinets, tubas, trumpets and french horns being played. The brass instruments help create that sense of a momentous occasion.", 
'time': '0:00-10:00'}

{'text': 'This is a classical music piece from a movie soundtrack. 
There is a clarinet playing the main melody while a brass section and a flute are playing the melody. 
The rhythmic background is provided by the acoustic drums. The atmosphere is epic and victorious. 
This piece could be used in the soundtrack of a historical drama movie during the scenes of an army marching towards the end.', 
'time': '10:00-20:00'}

{'text': 'This is a live performance of a classical music piece. There is a harp playing the melody while a horn is playing the bass line in the background. 
The atmosphere is epic. This piece could be used in the soundtrack of a historical drama movie during the scenes of an adventure video game.', 
'time': '20:00-30:00'}

Re-Implementation

Checking lpmc/llm_captioning and lpmc/music_captioning

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Acknowledgement

We would like to thank the WavCaps for audio-captioning training code and deezer-playntell for contents based captioning evaluation protocol. We would like to thank OpenAI for providing the GPT-3.5 Turbo API, which powers this project.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@article{doh2023lp,
  title={LP-MusicCaps: LLM-Based Pseudo Music Captioning},
  author={Doh, SeungHeon and Choi, Keunwoo and Lee, Jongpil and Nam, Juhan},
  journal={arXiv preprint arXiv:2307.16372},
  year={2023}
}

About

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.9%
  • Jupyter Notebook 13.1%