-
Notifications
You must be signed in to change notification settings - Fork 272
feature: Add Azure.Mcp.Tools.Speech tool azmcp_speech_tts_synthesize #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new Azure MCP tool azmcp_speech_tts_synthesize that enables text-to-speech synthesis using Azure AI Services Speech. The tool converts text to audio files with configurable language, voice, format, and custom voice model support.
Key Changes
- Added TTS synthesis command with comprehensive parameter validation
- Implemented streaming-based audio synthesis for efficient memory management
- Added extensive unit and live tests for various synthesis scenarios
Reviewed Changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/Azure.Mcp.Tools.Speech/src/Commands/Tts/TtsSynthesizeCommand.cs | New command implementation for TTS synthesis with validation and error handling |
| tools/Azure.Mcp.Tools.Speech/src/Services/SpeechService.cs | Core TTS synthesis logic with streaming support and error handling |
| tools/Azure.Mcp.Tools.Speech/src/Services/ISpeechService.cs | Interface extension for TTS synthesis method |
| tools/Azure.Mcp.Tools.Speech/src/Options/Tts/TtsSynthesizeOptions.cs | Options class for TTS synthesis parameters |
| tools/Azure.Mcp.Tools.Speech/src/Options/SpeechOptionDefinitions.cs | Command-line option definitions for TTS parameters |
| tools/Azure.Mcp.Tools.Speech/src/Models/SynthesisResult.cs | Result model for TTS synthesis output |
| tools/Azure.Mcp.Tools.Speech/src/Commands/SpeechJsonContext.cs | JSON serialization context updates |
| tools/Azure.Mcp.Tools.Speech/src/SpeechSetup.cs | Registration of TTS command group |
| tools/Azure.Mcp.Tools.Speech/tests/Azure.Mcp.Tools.Speech.UnitTests/Tts/TtsSynthesizeCommandTests.cs | Comprehensive unit tests for TTS command |
| tools/Azure.Mcp.Tools.Speech/tests/Azure.Mcp.Tools.Speech.LiveTests/SpeechCommandTests.cs | Live integration tests for TTS functionality |
| servers/Azure.Mcp.Server/docs/azmcp-commands.md | Documentation for TTS command usage |
| servers/Azure.Mcp.Server/docs/e2eTestPrompts.md | E2E test prompts for TTS scenarios |
| servers/Azure.Mcp.Server/README.md | README updates describing TTS capabilities |
| eng/tools/ToolDescriptionEvaluator/prompts.json | Test prompts for tool description evaluation |
tools/Azure.Mcp.Tools.Speech/tests/Azure.Mcp.Tools.Speech.LiveTests/SpeechCommandTests.cs
Show resolved
Hide resolved
|
|
|
Hi @joshfree , this PR is for Ignite. We can hold it until October 28th to merge it. But it would be the best if you can start reviewing the PR so we can merge it after Oct 28th. Thanks! |
|
Hi @joshfree @alzimmermsft , could you help review the PR? we'd like to publish before Ignite if possible. Thanks! |
3284bfd to
8dcf2e1
Compare
|
Hi @joshfree , the PR is updated to fix conflicts, could you help review? Thanks. |
d7a144f to
e2fd2ea
Compare
|
Hi @joshfree @alzimmermsft , could you help review the PR? Thanks. |
|
@dilin-MS2 please review your team mates PR. Thanks |
|
This PR doesn't look like it is rebased from main correctly. It shows many unrelated edits for AI Foundry tools |
joshfree
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
Please double check you're not including unrelated edits
| ] | ||
| }, | ||
| { | ||
| "name": "list", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this included in the PR if you're adding 1 new speech tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is changed by the ToolDescriptionEvaluator tool after my local running.
| ```bash | ||
| # Synthesize speech from text and save to an audio file using Azure AI Services Speech | ||
| # ❌ Destructive | ✅ Idempotent | ❌ OpenWorld | ❌ ReadOnly | ❌ Secret | ✅ LocalRequired | ||
| azmcp speech tts synthesize --endpoint <endpoint> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiangyan99 please review. This doesn't look like the file was generated, it instead looks hand-edited and I'm assuming this will break the next time the file is generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are auto-generated.
| """ | ||
| Convert text to speech using Azure AI Services Speech. This command takes text input and generates an audio file using advanced neural text-to-speech capabilities. | ||
| You must provide an Azure AI Services endpoint (e.g., https://your-service.cognitiveservices.azure.com/), the text to convert, and an output file path. | ||
| Optional parameters include language specification (default: en-US), voice selection, audio output format (default: Riff24Khz16BitMonoPcm), and custom voice endpoint ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested with other tool descriptions which teach the LLM the rest of the optional parameters? Eg more locale examples more encoding examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, tested with different locales and formats.
| var supportedExtensions = new HashSet<string> | ||
| { | ||
| ".wav", ".mp3", ".ogg", ".raw" | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn this into a static field
| // Validate output file path | ||
| if (string.IsNullOrWhiteSpace(fileValue)) | ||
| { | ||
| commandResult.AddError("Output file path cannot be empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there also be a check for the file already existing? I don't want to support the ability to overwrite local files at this time.
Also based on destructive=false in metadata overwriting a local file would require that being true. Which again, I don't really want to support yet.
| context.Response.Status = HttpStatusCode.OK; | ||
| context.Response.Message = "Speech synthesis completed successfully."; | ||
| context.Response.Results = ResponseResult.Create( | ||
| new TtsSynthesizeCommandResult(result), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| new TtsSynthesizeCommandResult(result), | |
| new(result), |
| // Parse and validate the JSON result | ||
| var jsonResult = JsonDocument.Parse(resultText); | ||
| var resultObject = jsonResult.RootElement; | ||
| Assert.True(resultObject.TryGetProperty("result", out var resultProperty)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When requiring a JSON property use AssertProperty
| Assert.True(resultObject.TryGetProperty("result", out var resultProperty)); | |
| var resultProperty = resultObject.AssertProperty("result"); |
This will provide better debugging information if property retrieval fails.
| [Fact] | ||
| public void Constructor_WithValidLogger_ShouldCreateInstance() | ||
| { | ||
| var command = new TtsSynthesizeCommand(_logger); | ||
| Assert.NotNull(command); | ||
| Assert.Equal("synthesize", command.Name); | ||
| } | ||
|
|
||
| [Fact] | ||
| public void Properties_ShouldHaveExpectedValues() | ||
| { | ||
| Assert.Equal("synthesize", _command.Name); | ||
| Assert.Equal("Synthesize Speech from Text", _command.Title); | ||
| Assert.NotEmpty(_command.Description); | ||
| Assert.False(_command.Metadata.Destructive); | ||
| Assert.True(_command.Metadata.Idempotent); | ||
| Assert.False(_command.Metadata.OpenWorld); | ||
| Assert.False(_command.Metadata.ReadOnly); | ||
| Assert.True(_command.Metadata.LocalRequired); | ||
| Assert.False(_command.Metadata.Secret); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these tests, they aren't very useful and will make maintenance more painful.
What does this PR do?
Add Azure.Mcp.Tools.Speech tool azmcp_speech_tts_synthesize
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech
GitHub issue number?
#852
Pre-merge Checklist
servers/Azure.Mcp.Server/CHANGELOG.mdand/orservers/Fabric.Mcp.Server/CHANGELOG.mdfor product changes (features, bug fixes, UI/UX, updated dependencies)servers/Azure.Mcp.Server/README.mdand/orservers/Fabric.Mcp.Server/README.mddocumentationeng/scripts/Process-PackageReadMe.ps1. See Package README/servers/Azure.Mcp.Server/docs/azmcp-commands.mdand/or/docs/fabric-commands.mdToolDescriptionEvaluatorand obtained a score of0.4or more and a top 3 ranking for all related test prompts/servers/Azure.Mcp.Server/docs/e2eTestPrompts.mdcrypto mining, spam, data exfiltration, etc.)/azp run mcp - pullrequest - liveto run Live Test Pipeline