Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Datasets Pipeline detail #2

Open
BuJiaLaDiii opened this issue Nov 6, 2024 · 1 comment
Open

About Datasets Pipeline detail #2

BuJiaLaDiii opened this issue Nov 6, 2024 · 1 comment

Comments

@BuJiaLaDiii
Copy link

I'm glad to see this great work! When you mention handling multiple sound sources during raw data processing, I’d like to ask: how do you detect multiple sound sources, or identify individual sound events? Could you share any methods for data preparation?

@PeiwenSun2000
Copy link
Owner

PeiwenSun2000 commented Nov 15, 2024

Our dataset is enhanced by GPT and builds upon existing audio-caption datasets. Consequently, the problem you mentioned simplifies when using the original captions.

  1. For strongly labeled data such as AudioSet_SL and AudioCaps, sound sources can be effectively identified by inputting the original descriptions into GPT.
  2. For weakly labeled data, we recommend consulting WavCaps, whose exemplary efforts have significantly aided our work.

If you mean temporal-level events detection, the active events detection is applied after the construction of the single-source pool to enhance the temporal diversity.

Although we cannot guarantee that the sound sources in our dataset are as good as those in the strongly labeled data, the trade-off is acceptable given the larger scale. Detailed descriptions and quality analysis are provided in our paper's appendix.

Thanks for your attention. I hope my answers will help you with your questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants