Inefficient On-the-Fly Feature Extraction & Suggestions for Optimization

I have a few concerns and suggestions regarding the current training setup:
1. On-the-fly Feature Extraction: Models like MERT, mHuBERT, and DCAE are all frozen during training, yet features are still extracted on-the-fly. This significantly slows down training and often causes OOM. It would be much more efficient to extract these features beforehand and load them during training. At the very least, the DCAE features should be pre-extracted. Compared to MERT and mHuBERT, DCAE features are much smaller in storage size and easier to handle offline.
2. Using Intermediate Layers from MERT: For MERT, it might be worth exploring embeddings from intermediate layers (e.g., layer 7) instead of the final layer. I discussed this with one of the MERT authors, and they suggested that using middle-layer representations might yield better results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient On-the-Fly Feature Extraction & Suggestions for Optimization #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient On-the-Fly Feature Extraction & Suggestions for Optimization #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions