Currently, the Feature Extraction task includes both models for audio and text feature extraction (it is officially placed under the NLP modality). I think it would be nice to have a new task for Audio Feature Extraction task just like Image Feature Extraction to better label models.
Some Audio Feature Extraction models in Feature Extractions: