Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRCap-lxq #133

Merged
merged 7 commits into from
Sep 26, 2024
Merged

DRCap-lxq #133

merged 7 commits into from
Sep 26, 2024

Conversation

Andreas-Xi
Copy link
Collaborator

This PR add the recipes for the paper "DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning"

  • Created exmaples/drcap_zeroshot_aac to enable zero-shot AAC training.
  • Created src/slam_llm/models/CLAP to enable using the CLAP model as the encoder
  • Modified src/slam_llm/datasets/audio_dataset.py to enable RAG during training and during inference
  • Modified src/slam_llm/modes/encoder.py and src/slam_llm/models/slam_model.py to enable encoding via CLAP text/audio encoders

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the audio_dataset.py at examples/drcap_zeroshot_aac/dataset and DO NOT modify this file.(You can only modify this file for stable features such as choosing mel/wav).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created examples/drcap_zeroshot_aac/dataset/zs_audio_dataset.py and kept audio_dataset.py unchanged.

@ddlBoJack ddlBoJack merged commit e7a03c3 into X-LANCE:main Sep 26, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants