You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using the 1,5kbps codec model for now despite the fact that the speech quality is terrible. Google generation examples have a lot better quality - one explanation is that they have trained a special purpose VQ codec on the speech-only LibreLight dataset and it's providing better quality at lower bitrates than the general purpose audio codecs trained on speech, music and other sounds.
Sound quality is something we can fix with more training later on, after we prove the whole pipeline works, so for now we should cut everything and focus on making training easiest.
The text was updated successfully, but these errors were encountered:
We have a notebook that shows how to extract acoustic tokens.
We are using the 1,5kbps codec model for now despite the fact that the speech quality is terrible. Google generation examples have a lot better quality - one explanation is that they have trained a special purpose VQ codec on the speech-only LibreLight dataset and it's providing better quality at lower bitrates than the general purpose audio codecs trained on speech, music and other sounds.
Sound quality is something we can fix with more training later on, after we prove the whole pipeline works, so for now we should cut everything and focus on making training easiest.
The text was updated successfully, but these errors were encountered: