About Datasets Pipeline detail #2

BuJiaLaDiii · 2024-11-06T11:49:39Z

I'm glad to see this great work! When you mention handling multiple sound sources during raw data processing, I’d like to ask: how do you detect multiple sound sources, or identify individual sound events? Could you share any methods for data preparation?

PeiwenSun2000 · 2024-11-15T08:13:19Z

Our dataset is enhanced by GPT and builds upon existing audio-caption datasets. Consequently, the problem you mentioned simplifies when using the original captions.

For strongly labeled data such as AudioSet_SL and AudioCaps, sound sources can be effectively identified by inputting the original descriptions into GPT.
For weakly labeled data, we recommend consulting WavCaps, whose exemplary efforts have significantly aided our work.

If you mean temporal-level events detection, the active events detection is applied after the construction of the single-source pool to enhance the temporal diversity.

Although we cannot guarantee that the sound sources in our dataset are as good as those in the strongly labeled data, the trade-off is acceptable given the larger scale. Detailed descriptions and quality analysis are provided in our paper's appendix.

Thanks for your attention. I hope my answers will help you with your questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Datasets Pipeline detail #2

About Datasets Pipeline detail #2

BuJiaLaDiii commented Nov 6, 2024

PeiwenSun2000 commented Nov 15, 2024 •

edited

Loading

About Datasets Pipeline detail #2

About Datasets Pipeline detail #2

Comments

BuJiaLaDiii commented Nov 6, 2024

PeiwenSun2000 commented Nov 15, 2024 • edited Loading

PeiwenSun2000 commented Nov 15, 2024 •

edited

Loading