Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Speech.Synthesis.SpeechSynthesizer not implemented in core? #46730

Closed
jasonliaocn opened this issue Apr 13, 2020 · 51 comments
Closed

System.Speech.Synthesis.SpeechSynthesizer not implemented in core? #46730

jasonliaocn opened this issue Apr 13, 2020 · 51 comments

Comments

@jasonliaocn
Copy link

I'm trying to migrate some of the Wpf application from .net framework to .net core, then I cannot find a replacement of the System.Speech.Synthesis.SpeechSynthesizer which exists in the System.Speech.dll.
And the msdn docs says not supported in .net core.
Is this removed or how can we use the speech function in Windows 10 without Azure Speech Services?

@GrabYourPitchforks GrabYourPitchforks transferred this issue from dotnet/core Apr 14, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@danmoseley
Copy link
Member

@AlexGhiondea is this an Azure service that your team tracks?

@jkotas
Copy link
Member

jkotas commented Apr 14, 2020

#30991

@danmoseley
Copy link
Member

Thank you, I had somehow misread. @jasonliaocn does that answer your question? Have you considered the Azure service - is it an option?

@duaneking
Copy link

duaneking commented Apr 17, 2020

I'm also blocked by not having this.

This System.Speech namespace is VITAL to the visually disabled community, as vital as System.Console is for everybody else. Not having this has been a HUGE DEAL as an Azure service is NOT AN OPTION.

The community NEEDS a LOCAL running system that can create this speech. Nothing non-local will work.

@terrajobst
Copy link
Member

terrajobst commented Apr 18, 2020

Having an option that doesn't require online access makes sense.

Is cross-platform a requirement though? System.Speech is a wrapper around Windows COM APIs and thus isn't cross-platform, even if we were to port it to .NET Core.

@duaneking
Copy link

I think it could be.

However, I think the System.Speech API interface is really was needed here; And not just Speech.Say as we need both ends so we can get input and output via speech if we do the work.

I think if it supports Win10 and is open source that will be enough to get the community involved.'

@vatsan-madhavan vatsan-madhavan transferred this issue from dotnet/runtime Apr 24, 2020
@danmoseley
Copy link
Member

Moved to WPF since that is where the expertise lies for this API.

@birbilis
Copy link

birbilis commented Apr 26, 2020

From what I remember from when I was developing SpeechLib (https://github.com/Zoomicon/SpeechLib) there was a separate Microsoft.Speech and a System.Speech namespace with similar (but not exactly the same/compatible) functionality. I think the former one was promoted for Kinect.

If you see the homepage of that wrapper lib (that was trying to hide any differences in those APIs), it points to some projects that show how to use it (SpeechTurtle [simple speech-based turtle graphics] or the more complex one TrackingCam [tracking a presenter]) or forked/expanded (aka Hotspotizer [gesture [Kinect] and speech recognition to simulate keypresses]).
Those three projects effectively also serve as use-cases for speech synthesis and recognition (via defined command dictionaries, not free speech recognition, although one could define some streaming text API for that one too I guess - not sure if the Microsoft and System speech APIs had support though for continuous speech recognition, although one could check if Azure Cognitive APIs define some API for that)

@birbilis
Copy link

birbilis commented Apr 27, 2020

@kolappannathan - moving discussion (sorry, long post, hope some of the links are useful) from #30991 (comment) here too as requested

From what I remember Microsoft.Speech was in Microsoft Speech Platform (probably also related to older MS Speech Server that had become Office Live Communications Server) that must have been intended for use in Telephony-based services on servers I think (e.g. recognizers may had been finetuned for such scenaria) and was also shipped with Kinect v1 for Windows (maybe v2 too) installers if I remember correctly.

That should be the reason I was supporting both at that SpeechLib library (I remember setting up recognition dictionaries had some differences but I could abstract them more or less).

From comments in the SpeechLib code, I think System.Speech.Recognition and Microsoft.Speech.Recognition were both working on Windows (and with the KinectRecognizer too) but Microsoft.Speech.Synthesis wasn't working on Windows (probably the Kinect installer didn't bother to install the Runtime Languages for speech synthesis, see link below). See the includes and comments at

Speaking of the comments at the SpeechRecognitionKinectV1 (descendent class from my SpeechRecognition one), I see pointer to
https://web.archive.org/web/20160202041952/http://kin-educate.blogspot.gr/2012/06/speech-recognition-for-kinect-easy-way.html (the original URL isn't available) and if you see that code it also uses the Microsoft.Speech namespace

Links:

@birbilis
Copy link

birbilis commented Apr 27, 2020

BTW, I know there was a trend to move all similar services to the cloud, but currently there's also a reverse trend to move them into at least IoT devices (Edge computing), why not back to one's computer/notebook/phone too?
What a dev needs is abstractions with pluggable implementations so that they don't get bothered on implementation details of specific service chosen and/or can switch services on the fly based on network connectivity, available power (battery) and CPU and available space in client device, functionality provided by the client device OS or hardware etc.

@birbilis
Copy link

birbilis commented Apr 27, 2020

I know the thread is on SpeechSynthesis, but since SpeechRecognition (verbal commands of predefined syntax, not freespeech recognition are very useful in app control and accessibility) is under the Speech namespace (both Microsoft and System one), these may be useful too:

@duaneking
Copy link

I consider both speech synth and speech speech recognition to be issues here, both are needed equally to be standard parts of .net/.net core.

This is not about just speaking. This is about making an app usable using only speech with no sight required. It asks you questions. You answer. It does things. The goal is to not need your eyes at all for what could otherwise be console apps.

If MSFT is true to its stated values of inclusiveness, this should be an easy thing.

This System.Speech namespace is VITAL to the visually disabled community, as vital as System.Console is for everybody else. Not having this has been a HUGE DEAL as it locks people out. Right now, the API's effectively show a preference for sighted people and I feel like that's a missed opportunity for inclusion, to say it as lightly as I can.

@TylerGubala
Copy link

All accessibility functions should be available in all frameworks. Otherwise developers will either be shackled to one framework or will be forced to make tightly coupled implementations for themselves, like this one I found that calls powershell just to get the speech service.

I think it's possible to do better in this area. Azure should not be the end-all-be-all IMO. Sometimes my internet goes out.

@fredm73
Copy link

fredm73 commented May 25, 2020

I'd like to add my voice: I have worked with blind people in various countries to bring chess to them (called "chessSpeak"). I'd like to convert it to Core 3.1 (Windows desktop only). I looked at an Azure solution, but that is not really viable for free software.

@victorvhpg
Copy link

victorvhpg commented Jul 26, 2020

All accessibility functions should be available in all frameworks. Otherwise developers will either be shackled to one framework or will be forced to make tightly coupled implementations for themselves, like this one I found that calls powershell just to get the speech service.

I think it's possible to do better in this area. Azure should not be the end-all-be-all IMO. Sometimes my internet goes out.

Yes i agree.
We need a local/offline solution like System.Speech

@coderb
Copy link

coderb commented Sep 16, 2020

+1 for local speech api on windows

@ocdtrekkie
Copy link

This basically writes off any interest I have in moving into .NET Core/.NET 5, and that's pretty disappointing. Cloud isn't a viable answer.

@neodon
Copy link

neodon commented Nov 8, 2020

I'm disappointed there isn't some alternative local speech synthesis and recognition solution in .NET Core. Don't get me wrong - .NET Core is incredible and this is just a small piece. But it's in the 80% of the most important things we need, especially to support those in our community with vision and hearing challenges.

@duaneking
Copy link

Anything that is a server or client that goes over the network or a network interface does not meet basic accessibility requirements here as the network increases lag, costs, etc and is a burden on the user.

Many who are blind don't even have easy access to the internet, as a computer with the accessibility tools required is often out of their price range and open source options are limited and actively fought against by the big companies that make the most money from selling themselves via insurance claims.

Also, a braille terminal is EXPENSIVE; most people who need them have to use insurance to buy them, because they can cost thousands of dollars depending on the model and most of the time people who want them don't have that money.

.Net Core NEEDS System.Speech.* and a Console.Speak(string text); standard to be inclusive and support the disabled. An Azure server or service is directly at odds with the accessibility requirements here and will not work,

@Carlos487
Copy link

Adding support to the Microsoft.Windows.Compatibility would be great at least to port existing applications in Windows.

@duaneking
Copy link

duaneking commented Nov 21, 2020

Microsoft publicly states its committed to accessibility at https://www.microsoft.com/en-us/accessibility yet something seems to be stopping the company from aligning as One Microsoft in order to support Diversity and Inclusion and Accessibility in this way; May I ask what that is?

@lindomar-marques55
Copy link

no matter how much they try to promote azure, for some programmers azure is definitely out of the question, as in my case and killing system.speech and system.SpeechSynthesizer will bring many difficulties for programmers with few resources or for programs to be used in communities without real-time network access

@danmoseley
Copy link
Member

Hello everyone. Thank you for your patience, and apologies for being silent for a little while. You've made it really clear there's a lot of demand for this and we have recently been working to make this open source: we got the last approvals we needed today and I have pushed up a PR just now. When that goes through, we can push up a NuGet package and I will ask you folks to confirm for me that it works for you. As you know, this is a Windows-only legacy speech tech that will not receive new investment or features: we're porting it so that all the existing code and users you've told us about above can continue to work on .NET Core/.NET 5, Powershell Core etc.

cc @terrajobst @fabiant3

@ocdtrekkie
Copy link

@danmosemsft That's fantastic to hear. It means I can see a path forward again for migrating to .NET 5!

@fredm73
Copy link

fredm73 commented Dec 11, 2020 via email

@duaneking
Copy link

The code drop: #45941

If this is the entire full stack, does that mean we can also recreate voices?

@ocdtrekkie
Copy link

@duaneking They are open sourcing the System.Speech bindings to call the Windows Speech API. They aren't open sourcing the Windows speech components.

@birbilis
Copy link

birbilis commented Dec 13, 2020

Cross-platform apps that want to work in disconnected mode can wrap this for Windows and wrap some other engine on Linux etc.
For example SpeechLib (https://github.com/Zoomicon/SpeechLib) was wrapping both Microsoft.Speech and System.Speech. Could use that as basis to wrap more engines. Could even detect connected mode and use (similarly wrapped as a pluggable engine) the Azure Speech service (though that can end up costing too much or eat up any free credits one has I guess so not much of an option for free and FOSS apps)

@duaneking
Copy link

duaneking commented Dec 13, 2020

@terrajobst Respectfully,. that doesn't work for the community.

Any investment in voice on Azure or needing a network to work is antithetical to the needs of the community.

I had hoped all the code would be made available so that the community could port it as needed as open source.

@terrajobst
Copy link
Member

I had hoped all the code would be made available so that the community could port it as needed as open source.

I’m sorry, I thought that was clear when I said that System.Speech simply calls Windows APIs. I should have made it more clear that the best we can do is release System.Speech itself, not the underlying OS implementation.

@duaneking
Copy link

duaneking commented Dec 14, 2020

On Linux and MacOSX, festival and flite might be a simple plug-in option.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 8, 2021
@danmoseley danmoseley transferred this issue from dotnet/wpf Jan 8, 2021
@danmoseley
Copy link
Member

Gonna transfer this back to where the code now is.

@danmoseley danmoseley added area-System.Speech and removed untriaged New issue has not been triaged by the area owner labels Jan 8, 2021
@danmoseley
Copy link
Member

Note this hasn't shipped yet. #45941 needs to complete, and it needs shiproom approval, then I'm aiming to get it out in the Feb patch Tuesday if we can.

@danmoseley
Copy link
Member

This is on track to go out Feb 9th both standalone and as part of the Windows compat pack.

@danmoseley
Copy link
Member

Hello everyone, this shipped today. The Nuget package is System.Speech or get it via updating the Windows Compatibility Pack package reference.

Could folks please post back here to confirm it works successfully for them? I'd appreciate that.

@danmoseley
Copy link
Member

As a .NET Standard 2.0 library, this will work on all supported versions of .NET Core (ie back to 2.1)

@duaneking
Copy link

Is platform compatibility a goal for this release?

@danmoseley
Copy link
Member

danmoseley commented Feb 9, 2021

@duaneking could you clarify ? If you mean, is it a goal that code written against the .NET Framework library works against this one, yes. I do not know of cases it would not. However the main focus is the core synthesis/recognition capability, which was led to the original asks, and not on the broader range of scenarios supported by the API. That would influence whether we make any fix.

@ocdtrekkie
Copy link

@duaneking As previously discussed, this release is porting the Windows-only System.Speech API to work on .NET Core. It calls components built into Windows, and is not available outside of it.

@danmoseley
Copy link
Member

@ocdtrekkie thanks, I understand the question now. Correct, there is no plan to make this work on any other OS. This is fundamentally a wrapper around OS functionality.

@lukeb1961
Copy link

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Users\LukeB> find-module System.Speech
Find-Package: C:\program files\powershell\7-preview\Modules\PowerShellGet\PSModule.psm1:8879
Line |
8879 | PackageManagement\Find-Package @PSBoundParameters | Microsoft …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| No match was found for the specified search criteria and module name 'System.Speech'. Try
| Get-PSRepository to see all available registered module repositories.

PS C:\Users\LukeB> find-package System.Speech

Name Version Source Summary


System.Speech 5.0.0 nuGet.org v2 Provides types to perform speech synthesis and speech…
System.Speech 5.0.0 nuGet.org Provides types to perform speech synthesis and speech…

PS C:\Users\LukeB> find-package System.Speech | install-package -Force
Install-Package: Dependency loop detected for package 'System.Speech'.
PS C:\Users\LukeB>

@danmoseley
Copy link
Member

@lukeb1961 thanks for the report. Could you please open a fresh issue, and we'll take a look?

for some reason, I get this far, then it stops.

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Windows\System32>  find-package system.speech

Name                           Version          Source           Summary
----                           -------          ------           -------
System.Speech                  5.0.0            nuget.org        Provides types to perform speech synthesis and speech…

PS C:\Windows\System32>  find-package system.speech | install-package -scope currentuser

The package(s) come(s) from a package source that is not marked as trusted.
Are you sure you want to install software from 'nuget.org'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "N"): A

@danmoseley
Copy link
Member

I'm not very knowledgeable with Powershell, but from internet search, it seems that -skipdependencies can avoid this hang. The following makes speech for me:

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Windows\System32> cd \test
PS C:\test> find-package System.Speech | install-package -scope currentuser -skipdependencies -destination .
PS C:\test> $a = [System.Reflection.Assembly]::LoadFrom('C:\test\System.Speech.5.0.0\runtimes\win\lib\netcoreapp2.1\System.Speech.dll')
PS C:\test> $ss = [System.Speech.Synthesis.SpeechSynthesizer]::new()
PS C:\test> $ss.SetOutputToDefaultAudioDevice()
PS C:\test> $prompt = [System.Speech.Synthesis.Prompt]::new('hello world')
PS C:\test> $ss.Speak($prompt)
PS C:\test>

There's probably a more efficient way to do it, as I say I'm not very knowledgeable about Powershell, but this proves that Speech works on Powershell.

Can you confirm this works for you @lukeb1961 ?

@lukeb1961
Copy link

yes, -SkipDependencies worked immediately.

@danmoseley
Copy link
Member

Ok good. That might be worth reporting to the nuget repo.

@lukeb1961
Copy link

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Users\LukeB> Add-Type -LiteralPath ((Get-ChildItem -Filter *.dll -Recurse (Split-Path (Get-Package system.speech).Source)).FullName)[-1]
PS C:\Users\LukeB> $synth = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
PS C:\Users\LukeB> $synth

State Rate Volume Voice


Ready 0 100 System.Speech.Synthesis.VoiceInfo

PS C:\Users\LukeB> $synth.GetInstalledVoices().voiceinfo.name
Microsoft David Desktop
Microsoft James
Microsoft Catherine
Microsoft Mark
Microsoft Zira Desktop
PS C:\Users\LukeB>

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests