[Feature] use the Azure TTS API? #1553

xiaoyifang · 2024-06-11T01:09:11Z

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-text-to-speech?tabs=windows%2Cterminal&pivots=programming-language-cli#prerequisites

The Microsoft TTS has offered a very high quality audio ,maybe worth a try to implemented as a function.

Users can select the text and use right-click menu to pronounce the text with the above engine.

The implementation can be wrapped around the cli command or use the provided C++ SDK.

shenlebantongying · 2024-06-13T06:23:32Z

The existing forvo/lingualibre are essentially the same.

We could merge them into "Online TTS".

Related art: this popular Anki add-on that provides TTS from many serveries (Azure included.).
https://ankiweb.net/shared/info/1436550454 (It has cringe AD but works ok. I used a few times in the fast.)

Maybe we can copy its UI.

The left side can select a service and add various related parameters.

xiaoyifang · 2024-06-13T08:13:19Z

The existing forvo/lingualibre are essentially the same.

not the same .
forvo/lingualibre are used for word, it is displayed as a seperate dictionary.

Azure TTS can be used for article or sentences which means it can work across different dictionaries.

Maybe we can copy its UI.

We can keep the configuration at minimum . speed and pitch can even be left out.

xiaoyifang · 2024-06-13T08:19:40Z

https://github.com/Vocab-Apps/anki-awesome-tts?tab=readme-ov-file
https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/speech-service/rest-text-to-speech.md

shenlebantongying · 2024-06-13T08:32:23Z

We can keep the configuration at minimum . speed and pitch can even be left out.

What I really mean is that we don't limit this feature to one specific service provider.

The implementation should allow adding new service providers easy 😅

Adding new parameters shouldn't be much harder because it is pretty much combining new query URLs.

xiaoyifang · 2024-06-13T09:21:18Z

Though a bit ambition at first. I have no rejection with this. :-)

shenlebantongying · 2024-06-18T21:30:48Z

After some investigation, I find this feature should not be implemented with the current dictionary.hh facilities.

Websites/Programs/TTS/Transliteration are inherently different from other local storage-based dictionaries.

It was a mistake to merge them into one. All implementations of those “dictionary but actually not” are messy AF. Websites/Programs/TTS/Transliteration are the afterthought of designing dictionary.hh.

Having one single dedicated object that inherits nothing to handles this feature.
plug it into the current "dictionary.hh" monstrosity.

I find doing 1. (aka write from scratch) is 10x simpler than 2.

Leaky abstraction in action:

For example, how to extend the properties of a dictionary with dictionary.hh? Instead of putting properties into the dictionary class, they all live in config.hh. Websites/Programs/TTS/Transliteration need extra properties, so we have these lines below.

dictionary.hh is abstract enough to have "toHTML" method but also concrete enough to have "dictionary files" that Websites/Programs/TTS/Transliteration don't have (so they all have to return empty.).

goldendict-ng/src/config.hh

Lines 448 to 824 in 6a91c6b

 /// A MediaWiki network dictionary definition 

 struct MediaWiki 

 { 

 QString id, name, url; 

 bool enabled; 

 QString icon; 

 QString lang; 

 MediaWiki(): 

 enabled( false ) 

 { 

 } 

 MediaWiki( QString const & id_, 

 QString const & name_, 

 QString const & url_, 

 bool enabled_, 

 QString const & icon_, 

 QString const & lang_ = "" ): 

 id( id_ ), 

 name( name_ ), 

 url( url_ ), 

 enabled( enabled_ ), 

 icon( icon_ ), 

 lang( lang_ ) 

 { 

 } 

 bool operator==( MediaWiki const & other ) const 

 { 

 return id == other.id && name == other.name && url == other.url && enabled == other.enabled && icon == other.icon 

 && lang == other.lang; 

 } 

 }; 

 /// Any website which can be queried though a simple template substitution 

 struct WebSite 

 { 

 QString id, name, url; 

 bool enabled; 

 QString iconFilename; 

 bool inside_iframe; 

 WebSite(): 

 enabled( false ) 

 { 

 } 

 WebSite( QString const & id_, 

 QString const & name_, 

 QString const & url_, 

 bool enabled_, 

 QString const & iconFilename_, 

 bool inside_iframe_ ): 

 id( id_ ), 

 name( name_ ), 

 url( url_ ), 

 enabled( enabled_ ), 

 iconFilename( iconFilename_ ), 

 inside_iframe( inside_iframe_ ) 

 { 

 } 

 bool operator==( WebSite const & other ) const 

 { 

 return id == other.id && name == other.name && url == other.url && enabled == other.enabled 

 && iconFilename == other.iconFilename && inside_iframe == other.inside_iframe; 

 } 

 }; 

 /// All the WebSites 

 typedef QVector< WebSite > WebSites; 

 /// Any DICT server 

 struct DictServer 

 { 

 QString id, name, url; 

 bool enabled; 

 QString databases; 

 QString strategies; 

 QString iconFilename; 

 DictServer(): 

 enabled( false ) 

 { 

 } 

 DictServer( QString const & id_, 

 QString const & name_, 

 QString const & url_, 

 bool enabled_, 

 QString const & databases_, 

 QString const & strategies_, 

 QString const & iconFilename_ ): 

 id( id_ ), 

 name( name_ ), 

 url( url_ ), 

 enabled( enabled_ ), 

 databases( databases_ ), 

 strategies( strategies_ ), 

 iconFilename( iconFilename_ ) 

 { 

 } 

 bool operator==( DictServer const & other ) const 

 { 

 return id == other.id && name == other.name && url == other.url && enabled == other.enabled 

 && databases == other.databases && strategies == other.strategies && iconFilename == other.iconFilename; 

 } 

 }; 

 /// All the DictServers 

 typedef QVector< DictServer > DictServers; 

 /// Hunspell configuration 

 struct Hunspell 

 { 

 QString dictionariesPath; 

 typedef QVector< QString > Dictionaries; 

 Dictionaries enabledDictionaries; 

 bool operator==( Hunspell const & other ) const 

 { 

 return dictionariesPath == other.dictionariesPath && enabledDictionaries == other.enabledDictionaries; 

 } 

 bool operator!=( Hunspell const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 /// All the MediaWikis 

 typedef QVector< MediaWiki > MediaWikis; 

 /// Chinese transliteration configuration 

 struct Chinese 

 { 

 bool enable; 

 bool enableSCToTWConversion; 

 bool enableSCToHKConversion; 

 bool enableTCToSCConversion; 

 Chinese(); 

 bool operator==( Chinese const & other ) const 

 { 

 return enable == other.enable && enableSCToTWConversion == other.enableSCToTWConversion 

 && enableSCToHKConversion == other.enableSCToHKConversion 

 && enableTCToSCConversion == other.enableTCToSCConversion; 

 } 

 bool operator!=( Chinese const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 struct CustomTrans 

 { 

 bool enable = false; 

 QString context; 

 bool operator==( CustomTrans const & other ) const 

 { 

 return enable == other.enable && context == other.context; 

 } 

 bool operator!=( CustomTrans const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 /// Romaji transliteration configuration 

 struct Romaji 

 { 

 bool enable; 

 bool enableHepburn; 

 bool enableNihonShiki; 

 bool enableKunreiShiki; 

 bool enableHiragana; 

 bool enableKatakana; 

 Romaji(); 

 bool operator==( Romaji const & other ) const 

 { 

 return enable == other.enable && enableHepburn == other.enableHepburn && enableNihonShiki == other.enableNihonShiki 

 && enableKunreiShiki == other.enableKunreiShiki && enableHiragana == other.enableHiragana 

 && enableKatakana == other.enableKatakana; 

 } 

 bool operator!=( Romaji const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 struct Transliteration 

 { 

 bool enableRussianTransliteration; 

 bool enableGermanTransliteration; 

 bool enableGreekTransliteration; 

 bool enableBelarusianTransliteration; 

 CustomTrans customTrans; 

 #ifdef MAKE_CHINESE_CONVERSION_SUPPORT 

 Chinese chinese; 

 #endif 

 Romaji romaji; 

 bool operator==( Transliteration const & other ) const 

 { 

 return enableRussianTransliteration == other.enableRussianTransliteration 

 && enableGermanTransliteration == other.enableGermanTransliteration 

 && enableGreekTransliteration == other.enableGreekTransliteration 

 && enableBelarusianTransliteration == other.enableBelarusianTransliteration && customTrans == other.customTrans && 

 #ifdef MAKE_CHINESE_CONVERSION_SUPPORT 

 chinese == other.chinese && 

 #endif 

 romaji == other.romaji; 

 } 

 bool operator!=( Transliteration const & other ) const 

 { 

 return !operator==( other ); 

 } 

 Transliteration(): 

 enableRussianTransliteration( false ), 

 enableGermanTransliteration( false ), 

 enableGreekTransliteration( false ), 

 enableBelarusianTransliteration( false ) 

 { 

 } 

 }; 

 struct Lingua 

 { 

 bool enable; 

 QString languageCodes; 

 bool operator==( Lingua const & other ) const 

 { 

 return enable == other.enable && languageCodes == other.languageCodes; 

 } 

 bool operator!=( Lingua const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 struct Forvo 

 { 

 bool enable; 

 QString apiKey; 

 QString languageCodes; 

 Forvo(): 

 enable( false ) 

 { 

 } 

 bool operator==( Forvo const & other ) const 

 { 

 return enable == other.enable && apiKey == other.apiKey && languageCodes == other.languageCodes; 

 } 

 bool operator!=( Forvo const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 struct Program 

 { 

 bool enabled; 

 enum Type { 

 Audio, 

 PlainText, 

 Html, 

 PrefixMatch, 

 MaxTypeValue 

 } type; 

 QString id, name, commandLine; 

 QString iconFilename; 

 Program(): 

 enabled( false ) 

 { 

 } 

 Program( bool enabled_, 

 Type type_, 

 QString const & id_, 

 QString const & name_, 

 QString const & commandLine_, 

 QString const & iconFilename_ ): 

 enabled( enabled_ ), 

 type( type_ ), 

 id( id_ ), 

 name( name_ ), 

 commandLine( commandLine_ ), 

 iconFilename( iconFilename_ ) 

 { 

 } 

 bool operator==( Program const & other ) const 

 { 

 return enabled == other.enabled && type == other.type && name == other.name && commandLine == other.commandLine 

 && iconFilename == other.iconFilename; 

 } 

 bool operator!=( Program const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 typedef QVector< Program > Programs; 

 #ifndef NO_TTS_SUPPORT 

 struct VoiceEngine 

 { 

 bool enabled; 

 //engine name. 

 QString engine_name; 

 QString name; 

 //voice name. 

 QString voice_name; 

 QString iconFilename; 

 QLocale locale; 

 int volume; // 0~1 allowed 

 int rate; // -1 ~ 1 allowed 

 VoiceEngine(): 

 enabled( false ), 

 volume( 50 ), 

 rate( 0 ) 

 { 

 } 

 VoiceEngine( QString engine_nane_, QString name_, QString voice_name_, QLocale locale_, int volume_, int rate_ ): 

 enabled( false ), 

 engine_name( engine_nane_ ), 

 name( name_ ), 

 voice_name( voice_name_ ), 

 locale( locale_ ), 

 volume( volume_ ), 

 rate( rate_ ) 

 { 

 } 

 bool operator==( VoiceEngine const & other ) const 

 { 

 return enabled == other.enabled && engine_name == other.engine_name && name == other.name 

 && voice_name == other.voice_name && locale == other.locale && iconFilename == other.iconFilename 

 && volume == other.volume && rate == other.rate; 

 } 

 bool operator!=( VoiceEngine const & other ) const 

 { 

 return !operator==( other ); 

 } 

 }; 

 typedef QVector< VoiceEngine > VoiceEngines; 

 #endif

xiaoyifang · 2024-06-19T00:31:29Z

After some investigation, I find this feature should not be implemented with the current dictionary.hh facilities.

I agree with this , Azure tts can be used across dictionaries and act on its own. It can be displayed as a single function(for example, in the right context menu).

shenlebantongying · 2024-06-19T07:15:36Z

Not sure about the experience. Azure tts's endpoint depends on region, a user needs to copy both endpoint and API key in a super condensed interface 😅

Uses this hurl file https://hurl.dev/

POST {{endpoint}}/cognitiveservices/v1

Ocp-Apim-Subscription-Key: ${Your key here}
X-Microsoft-OutputFormat: ogg-48khz-16bit-mono-opus
Content-Type: application/ssml+xml
User-Agent: WhatEver
<speak version='1.0' xml:lang='en-US'>
    <voice name='en-US-LunaNeural'>
        {{sentence}}
    </voice>
</speak>

with

hurl ./voice.hurl --variable endpont="https://eastus.api.cognitive.microsoft.com/" --variable sentence="This is nice!"  --output nice.ogg

will yield an audio.

The {{endpoint}} is obtained from the screenshot.
The voice name is needed from {{endpoint}}/cognitiveservices/voices/list

It seems all cloud TTS supports the same "SSML" thing

https://cloud.google.com/text-to-speech/docs/ssml
https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup
https://docs.aws.amazon.com/polly/latest/dg/ssml.html

xiaoyifang · 2024-06-19T07:47:42Z

a little ui improvement
POST {{endpoint}}/cognitiveservices/v1
can be
POST https://{{region}}.api.cognitive.microsoft.com/cognitiveservices/v1

users can use a dropdown list to select the regions which have fixed values in advance

voices can also be provided with fixed values in advance.

shenlebantongying · 2024-06-19T15:03:31Z

I think can add this one directly under “Edit” menu instead of “Edit -> Dictionaries”

Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.

Furthermore, as a separate component, the config can also be separated into a new file beside config -> config_cloud_tts.xml.

The timing of saving/reading config of different components are not entirely the same.

For example, the saving of MainWindowGeometry needs to read/write at program shutting down, while the dictionaries doesn't, there is no point of putting them in a single config. The “ominous” commitdata is overused. Opening/closing the editdictionaries dialog needs initializing/mutating excessive states (like crash of Qt TTS will bring down the entire dialog.).

Putting it into somewhere separate also makes adding/removing the feature entirely easy, there is no need to add #if feature_x macros, there is no need to carefully think about how to plug new feature things with existing ones. It is not clear how to add a config option without jumping around, and reading everything in the past.

Everything related to one component in one place

vs

orgy of features

TTS dialog -> read/write config
The TTS engine -> read config

(Side effect: this makes building this component as a separate program easy.)

xiaoyifang · 2024-06-20T00:29:37Z

Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.

move them to Edit->preference?

xiaoyifang · 2024-06-20T00:32:22Z

Furthermore, as a separate component, the config can also be separated into a new file beside config -> config_cloud_tts.xml.

This can be considered . azure tts can have its own config file.
It does not have to implemented the dictionary.hh

shenlebantongying · 2024-06-20T03:28:41Z

It is not really difficult to replicate AwesomeTTS for an audio preview pane 😅

Progress for today, a little app https://github.com/SourceReviver/temp_ctts_impl

xiaoyifang · 2024-06-20T05:44:48Z

Do you have time to implement this feature?

shenlebantongying · 2024-06-25T02:57:56Z

I think https://github.com/SourceReviver/temp_ctts_impl is complete for the initial version of this feature.

However, I need to prepare for an exam on Friday, so I will prepare an PR this weekends 😅

xiaoyifang · 2024-06-25T05:40:42Z

Exam first . PR can wait.

shenlebantongying · 2024-07-17T00:36:03Z

Offer user a way to stop playing the current selection . Especially when the selection is not soon to finish playing.

Refactoring the current "Pronounce" button on the toolbar is needed, I think? Currently, the length of selection of pronunciation is limited to 60. We can implement "stop" later.

I don't have another chunk of time until at least August 13. #1685 is usable as of now. Not sure how we should proceed. 😅

xiaoyifang · 2024-07-17T00:54:37Z

Offer user a way to stop playing the current selection . Especially when the selection is not soon to finish playing.

Refactoring the current "Pronounce" button on the toolbar is needed, I think? Currently, the length of selection of pronunciation is limited to 60. We can implement "stop" later.

I don't have another chunk of time until at least August 13. #1685 is usable as of now. Not sure how we should proceed. 😅

That is ok, I can continue to work on it when I'm available.

xiaoyifang added the help wanted PR welcomed. label Jun 11, 2024

shenlebantongying added the feature label Jun 13, 2024

xiaoyifang changed the title ~~[Feature] use the microsoft text to speech API?~~ [Feature] use the Azure TTS API? Jun 13, 2024

shenlebantongying self-assigned this Jun 19, 2024

shenlebantongying removed their assignment Jul 2, 2024

shenlebantongying self-assigned this Jul 13, 2024

shenlebantongying mentioned this issue Jul 13, 2024

feat: new tts implementation #1685

Merged

shenlebantongying removed their assignment Jul 16, 2024

shenlebantongying mentioned this issue Jul 17, 2024

feat: new (cloud) tts implementation #1695

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] use the Azure TTS API? #1553

[Feature] use the Azure TTS API? #1553

xiaoyifang commented Jun 11, 2024 •

edited

Loading

shenlebantongying commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 13, 2024 •

edited

Loading

shenlebantongying commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 13, 2024

shenlebantongying commented Jun 18, 2024

xiaoyifang commented Jun 19, 2024

shenlebantongying commented Jun 19, 2024

xiaoyifang commented Jun 19, 2024 •

edited

Loading

shenlebantongying commented Jun 19, 2024

xiaoyifang commented Jun 20, 2024

xiaoyifang commented Jun 20, 2024 •

edited

Loading

shenlebantongying commented Jun 20, 2024 •

edited

Loading

xiaoyifang commented Jun 20, 2024

shenlebantongying commented Jun 25, 2024 •

edited

Loading

xiaoyifang commented Jun 25, 2024

shenlebantongying commented Jul 17, 2024 •

edited

Loading

xiaoyifang commented Jul 17, 2024

[Feature] use the Azure TTS API? #1553

[Feature] use the Azure TTS API? #1553

Comments

xiaoyifang commented Jun 11, 2024 • edited Loading

shenlebantongying commented Jun 13, 2024 • edited Loading

xiaoyifang commented Jun 13, 2024 • edited Loading

xiaoyifang commented Jun 13, 2024 • edited Loading

shenlebantongying commented Jun 13, 2024 • edited Loading

xiaoyifang commented Jun 13, 2024

shenlebantongying commented Jun 18, 2024

xiaoyifang commented Jun 19, 2024

shenlebantongying commented Jun 19, 2024

xiaoyifang commented Jun 19, 2024 • edited Loading

shenlebantongying commented Jun 19, 2024

xiaoyifang commented Jun 20, 2024

xiaoyifang commented Jun 20, 2024 • edited Loading

shenlebantongying commented Jun 20, 2024 • edited Loading

xiaoyifang commented Jun 20, 2024

shenlebantongying commented Jun 25, 2024 • edited Loading

xiaoyifang commented Jun 25, 2024

shenlebantongying commented Jul 17, 2024 • edited Loading

xiaoyifang commented Jul 17, 2024

xiaoyifang commented Jun 11, 2024 •

edited

Loading

shenlebantongying commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 13, 2024 •

edited

Loading

shenlebantongying commented Jun 13, 2024 •

edited

Loading

xiaoyifang commented Jun 19, 2024 •

edited

Loading

xiaoyifang commented Jun 20, 2024 •

edited

Loading

shenlebantongying commented Jun 20, 2024 •

edited

Loading

shenlebantongying commented Jun 25, 2024 •

edited

Loading

shenlebantongying commented Jul 17, 2024 •

edited

Loading