Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable language option #2

Open
tiagojulianoferreira opened this issue Aug 26, 2023 · 3 comments
Open

Customizable language option #2

tiagojulianoferreira opened this issue Aug 26, 2023 · 3 comments

Comments

@tiagojulianoferreira
Copy link

tiagojulianoferreira commented Aug 26, 2023

Hello!

I believe the app is set to automatically translate to English, would it make sense to leave this option customizable via the frontend?

@XamHans
Copy link
Owner

XamHans commented Aug 29, 2023

Hi, indeed that would be cool. I just found how to achieve with with the help of the CLI. Do you have more info about this how to achieve this in code?

@tiagojulianoferreira
Copy link
Author

In fact, I noticed that by simply changing the model_name variable directly in the code to "base" in the line below, the app already transcribed my test video in the original language (Portuguese).
https://github.com/XamHans/video-2-text/blob/master/webserver/businessLogic.py#L14

By default, whisper detects the language of the video, but I couldn't understand why it translated a video in Portuguese into English using the original code.

@XamHans
Copy link
Owner

XamHans commented Sep 8, 2023

I dont know either. You can try this approach provided by original whisper repo:
Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.


import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants