Skip to content

Inefficient On-the-Fly Feature Extraction & Suggestions for Optimization #255

@fengshi-cherish

Description

@fengshi-cherish

I have a few concerns and suggestions regarding the current training setup:

  1. On-the-fly Feature Extraction: Models like MERT, mHuBERT, and DCAE are all frozen during training, yet features are still extracted on-the-fly. This significantly slows down training and often causes OOM. It would be much more efficient to extract these features beforehand and load them during training. At the very least, the DCAE features should be pre-extracted. Compared to MERT and mHuBERT, DCAE features are much smaller in storage size and easier to handle offline.
  2. Using Intermediate Layers from MERT: For MERT, it might be worth exploring embeddings from intermediate layers (e.g., layer 7) instead of the final layer. I discussed this with one of the MERT authors, and they suggested that using middle-layer representations might yield better results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions