Skip to content

Whisper version #41

Answered by jpc
Chaanks asked this question in Q&A
Jan 10, 2024 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

Hey, sorry for not replying earlier but I did not remember and I had to check myself.

This model was based on Whisper base.en. It has 512 semantic codes and downsamples the Whisper encoder by 2 (so it has 25 tokens/s).

If you want use the semantic tokens to build something I would strongly recommend whisper-vq-stoks-medium-en+pl.model. This one is based on Whisper medium, has the same token parameters (512 codes, 25 toks/s) but it was a lot better in every way I tested it.

Btw. You can torch.load all my .model files to check their configuration:

> torch.load('../../hub/whisper-vq-stoks-medium-en+pl.model')
{'config': {'codebook_dim': 64,
  'vq_codes': 512,
  'q_depth': 1,
  'n_head': 16,
  

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Chaanks
Comment options

Answer selected by Chaanks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants