The speech to text API provides two endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model.
Install with the built in Node-RED Palette manager or using npm:
npm install node-red-contrib-speech-to-text-ubos
When editing the nodes properties, to get your OPENAI_API_KEY
visit https://platform.openai.com/account/api-keys click "+ Create new secret key" then copy and paste the "API key" into the nodes API_KEY
property value.
To get your Organization
visit https://platform.openai.com/account/org-settings then copy and paste the "OrganizationID" into the nodes Organization
property value.
Only whisper-1 is currently available.
- [Required]
msg.OPENAI_API_KEY
: This is the API key provided by OpenAI. It is necessary for authentication when making requests to the OpenAI API.
-
When
msg.type
is set totranscriptions
:- [Required]
msg.file
: The audio file object (not file name) to transcribe, in one of these formats mp3, mp4, ,mpeg, mpga, m4a, wav, or webm. For example
msg.file = { "value": msg.req.files[0].buffer, "options": { "filename": msg.req.files[0].originalname } };
-
msg.prompt
: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. -
msg.response_format
: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt. -
msg.temperature
: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. -
msg.language
: The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
- [Required]
-
When
msg.type
is set totranslations
:- [Required]
msg.file
: The audio file object (not file name) to transcribe, in one of these formats mp3, mp4, ,mpeg, mpga, m4a, wav, or webm. For example
msg.file = { "value": msg.req.files[0].buffer, "options": { "filename": msg.req.files[0].originalname } };
-
msg.prompt
: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. -
msg.response_format
: The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt. -
msg.temperature
: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
- [Required]
Please report any issues or feature requests at GitHub.