Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
08fac52
checkout from speech-transcription folder
amber-yujueWang Nov 5, 2025
f60fa12
update after renaming
amber-yujueWang Nov 5, 2025
1004ec5
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 5, 2025
2f3482d
refactoring
amber-yujueWang Nov 6, 2025
fbfa4bb
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 6, 2025
f86ca0d
Updated dependency versions
amber-yujueWang Nov 6, 2025
aaa54cb
fix spell check
amber-yujueWang Nov 6, 2025
96e6746
Add spell check dictionary entries and version configuration
amber-yujueWang Nov 6, 2025
4fe7b25
test cusomization
amber-yujueWang Nov 6, 2025
db7e5f4
renamed setmodels to clarify
amber-yujueWang Nov 6, 2025
5e910a9
map from int to duration
amber-yujueWang Nov 7, 2025
8d2e316
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 7, 2025
6ce6128
add ci and pom.xml and retest customization
amber-yujueWang Nov 7, 2025
3484c06
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 7, 2025
61cd3c3
fix enable field
amber-yujueWang Nov 7, 2025
53bacf2
fix enable field
amber-yujueWang Nov 7, 2025
58106d6
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 8, 2025
b37ab49
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Nov 10, 2025
910939f
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 12, 2025
4650f87
move AudioFileDetails into TranscriptionOptions and add 2 constructor…
amber-yujueWang Nov 13, 2025
372ebee
update test
amber-yujueWang Nov 13, 2025
4739db8
change constructor to transcribe(TranscriptionOptions options)
amber-yujueWang Nov 13, 2025
a7df821
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 13, 2025
9b7989b
slve conflict version
amber-yujueWang Nov 13, 2025
37a1680
Merge upstream/main into wangamber/transcription
amber-yujueWang Nov 13, 2025
922d7c8
update test
amber-yujueWang Nov 13, 2025
fa0ed79
fix linting
amber-yujueWang Nov 13, 2025
997ee32
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 13, 2025
ed37a17
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 14, 2025
e94da0c
add codeowner
amber-yujueWang Nov 14, 2025
3221629
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 14, 2025
b90a526
add codeowner
amber-yujueWang Nov 14, 2025
5271b64
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 14, 2025
4362af0
add release date
amber-yujueWang Nov 14, 2025
8dbe475
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 14, 2025
eae16b2
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 14, 2025
4c5d853
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 14, 2025
4db7356
update changelog
amber-yujueWang Nov 14, 2025
c597e0e
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 14, 2025
8b7b17d
Merge branch 'main' into wangamber/transcription
amber-yujueWang Nov 17, 2025
e7d237c
add response
amber-yujueWang Nov 24, 2025
0c1248b
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Nov 24, 2025
32914ff
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Nov 26, 2025
d853705
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 4, 2025
821669f
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 4, 2025
4ef97e0
update sample, readme, tests
amber-yujueWang Dec 5, 2025
7182118
update tsp files
amber-yujueWang Dec 5, 2025
ae53c9e
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 5, 2025
822b629
update version_client
amber-yujueWang Dec 5, 2025
108ed83
regenerate sdk from typespec
amber-yujueWang Dec 5, 2025
f440c98
adding javadoc for customized function
amber-yujueWang Dec 6, 2025
d569e98
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Dec 8, 2025
ae8be1d
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 8, 2025
414e8af
update samples
amber-yujueWang Dec 12, 2025
fdb5660
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 12, 2025
89c75ab
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Dec 12, 2025
c0da94c
fix cspell
amber-yujueWang Dec 12, 2025
62418e8
fix cspell
amber-yujueWang Dec 12, 2025
6bf7a5d
fix codeownerlint
amber-yujueWang Dec 12, 2025
b9cdaed
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Dec 12, 2025
d91ab16
fix cspell
amber-yujueWang Dec 12, 2025
aa71822
fix codeowner lint
amber-yujueWang Dec 12, 2025
49fea43
update tests
amber-yujueWang Dec 12, 2025
0551402
update broken links
amber-yujueWang Dec 12, 2025
74400e3
Merge branch 'main' into wangamber/transcription
amber-yujueWang Dec 13, 2025
f9cc844
checkout cspell
amber-yujueWang Dec 13, 2025
edd2285
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Dec 13, 2025
82ed087
update readme and modify enabled property for EnhancedModeOptions
amber-yujueWang Dec 18, 2025
18de5f6
Merge branch 'wangamber/transcription' of https://github.com/amber-yu…
amber-yujueWang Dec 18, 2025
1d7ce7f
Merge remote-tracking branch 'upstream/main' into wangamber/transcrip…
amber-yujueWang Dec 18, 2025
3de96a9
modify enhanceed mode customization
amber-yujueWang Dec 18, 2025
2d0393e
modify enhanceed mode customization and update readme
amber-yujueWang Dec 19, 2025
cc7ef97
created a new service directory to put all the transcription SDK's under
amber-yujueWang Dec 19, 2025
4d11154
update tsp commit
amber-yujueWang Dec 19, 2025
3e9b8e7
fetch previous recording
amber-yujueWang Dec 19, 2025
3a93c14
Merge branch 'main' into wangamber/transcription
amber-yujueWang Dec 19, 2025
d760a51
redo recording test
amber-yujueWang Dec 19, 2025
ef2c1cf
undo changes to pom.xml in previous package service
amber-yujueWang Dec 19, 2025
181096e
update changelog
amber-yujueWang Dec 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,13 @@
# ServiceLabel: %Cognitive - Speech
# ServiceOwners: @rhurey

# PRLabel: %Speech Transcription
/sdk/transcription/azure-ai-speech-transcription/ @amber-yujueWang @rhurey @xitzhang @Azure/azure-java-sdk

# ServiceLabel: %Speech Transcription
# AzureSdkOwners: @amber-yujueWang @rhurey @xitzhang
# ServiceOwners: @rhurey @xitzhang @amber-yujueWang

# PRLabel: %Cognitive - Text Analytics
/sdk/textanalytics/ @samvaity @quentinRobinson @Azure/azure-java-sdk

Expand Down
1 change: 1 addition & 0 deletions eng/versioning/version_client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ com.azure:azure-ai-openai-realtime;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-openai-stainless;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-personalizer;1.0.0-beta.1;1.0.0-beta.2
com.azure:azure-ai-projects;1.0.0-beta.3;1.0.0-beta.4
com.azure:azure-ai-speech-transcription;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-textanalytics;5.5.11;5.6.0-beta.1
com.azure:azure-ai-textanalytics-perf;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-translation-text;1.1.7;2.0.0-beta.1
Expand Down
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@
<module>sdk/timeseriesinsights</module>
<module>sdk/tools</module>
<module>sdk/trafficmanager</module>
<module>sdk/transcription</module>
<module>sdk/translation</module>
<module>sdk/trustedsigning</module>
<module>sdk/vision</module>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Release History

## 1.0.0-beta.1 (2025-12-19)

### Features Added

- Initial release of Azure AI Speech Transcription client library for Java.
305 changes: 305 additions & 0 deletions sdk/transcription/azure-ai-speech-transcription/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
# Azure AI Speech Transcription client library for Java

The Azure AI Speech Transcription client library provides a simple and efficient way to convert audio to text using Azure Cognitive Services. This library enables you to transcribe audio with features like speaker diarization, profanity filtering, and phrase hints for improved accuracy.

## Documentation

Various documentation is available to help you get started:

- [API reference documentation][docs]
- [Product documentation][product_documentation]
- [Azure Speech Service documentation](https://learn.microsoft.com/azure/ai-services/speech-service/)

## Getting started

### Prerequisites

- [Java Development Kit (JDK)][jdk] with version 8 or above
- [Azure Subscription][azure_subscription]
- An [Azure Speech resource](https://learn.microsoft.com/azure/ai-services/speech-service/overview#try-the-speech-service-for-free) or [Cognitive Services multi-service resource](https://learn.microsoft.com/azure/ai-services/multi-service-resource)

### Adding the package to your product

[//]: # ({x-version-update-start;com.azure:azure-ai-speech-transcription;current})
```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-speech-transcription</artifactId>
<version>1.0.0-beta.1</version>
</dependency>
```
[//]: # ({x-version-update-end})

#### Optional: For Entra ID Authentication

If you plan to use Entra ID authentication (recommended for production), also add the `azure-identity` dependency:

```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.18.1</version>
</dependency>
```

### Authentication

Azure Speech Transcription supports two authentication methods:

#### Option 1: API Key Authentication (Subscription Key)

You can find your Speech resource's API key in the [Azure Portal](https://portal.azure.com) or by using the Azure CLI:

```bash
az cognitiveservices account keys list --name <your-resource-name> --resource-group <your-resource-group>
```

Once you have an API key, you can authenticate using `KeyCredential`:

```java
import com.azure.core.credential.KeyCredential;

TranscriptionClient client = new TranscriptionClientBuilder()
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
.credential(new KeyCredential("<your-api-key>"))
.buildClient();
```

#### Option 2: Entra ID OAuth2 Authentication (Recommended for Production)

For production scenarios, it's recommended to use Entra ID authentication with managed identities or service principals. This provides better security and easier credential management.

```java
import com.azure.identity.DefaultAzureCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;

// Use DefaultAzureCredential which works with managed identities, service principals, Azure CLI, etc.
DefaultAzureCredential credential = new DefaultAzureCredentialBuilder().build();

TranscriptionClient client = new TranscriptionClientBuilder()
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
.credential(credential)
.buildClient();
```

**Note:** To use Entra ID authentication, you need to:
1. Add the `azure-identity` dependency to your project
2. Assign the appropriate role (e.g., "Cognitive Services User") to your managed identity or service principal
3. Ensure your Cognitive Services resource has Entra ID authentication enabled

For more information on Entra ID authentication, see:
- [Authenticate with Azure Identity](https://learn.microsoft.com/azure/developer/java/sdk/identity)
- [Azure Cognitive Services authentication](https://learn.microsoft.com/azure/ai-services/authentication)

## Key concepts

### TranscriptionClient

The `TranscriptionClient` is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio to text.

### TranscriptionAsyncClient

The `TranscriptionAsyncClient` provides asynchronous methods for transcribing audio, allowing non-blocking operations that return reactive types.

### Audio Formats

The service supports various audio formats including WAV, MP3, OGG, and more. Audio must be:

- Shorter than 2 hours in duration
- Smaller than 250 MB in size

### Transcription Options

You can customize transcription with options like:

- **Profanity filtering**: Control how profanity is handled in transcriptions
- **Speaker diarization**: Identify different speakers in multi-speaker audio
- **Phrase lists**: Provide domain-specific phrases to improve accuracy
- **Language detection**: Automatically detect the spoken language
- **Enhanced mode**: Improve transcription quality with custom prompts, translation, and task-specific configurations

## Examples

### Transcribe an audio file

```java com.azure.ai.speech.transcription.readme
TranscriptionClient client = new TranscriptionClientBuilder()
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
.credential(new KeyCredential("<your-api-key>"))
.buildClient();

try {
// Read audio file
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));

// Create audio file details
AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));

// Create transcription options
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);

// Transcribe audio
TranscriptionResult result = client.transcribe(options);

// Process results
System.out.println("Duration: " + result.getDuration() + " ms");
result.getCombinedPhrases().forEach(phrase -> {
System.out.println("Channel " + phrase.getChannel() + ": " + phrase.getText());
});
} catch (Exception e) {
System.err.println("Error during transcription: " + e.getMessage());
}
```

### Transcribe using audio URL

You can transcribe audio directly from a URL without downloading the file first:

```java readme-sample-transcribeWithAudioUrl
TranscriptionClient client = new TranscriptionClientBuilder()
.endpoint("https://<your-resource-name>.cognitiveservices.azure.com/")
.credential(new KeyCredential("<your-api-key>"))
.buildClient();

// Create transcription options with audio URL
TranscriptionOptions options = new TranscriptionOptions("https://example.com/audio.wav");

// Transcribe audio
TranscriptionResult result = client.transcribe(options);

// Process results
result.getCombinedPhrases().forEach(phrase -> {
System.out.println(phrase.getText());
});
```

### Transcribe with multi-language support

The service can automatically detect and transcribe multiple languages within the same audio file.

```java com.azure.ai.speech.transcription.transcriptionoptions.multilanguage
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));

AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));

// Configure transcription WITHOUT specifying locales
// This allows the service to auto-detect and transcribe multiple languages
TranscriptionOptions options = new TranscriptionOptions(audioFileDetails);

TranscriptionResult result = client.transcribe(options);

result.getPhrases().forEach(phrase -> {
System.out.println("Language: " + phrase.getLocale());
System.out.println("Text: " + phrase.getText());
});
```

### Transcribe with enhanced mode

Enhanced mode provides advanced features to improve transcription accuracy with custom prompts. Enhanced mode is automatically enabled when you create an `EnhancedModeOptions` instance.

```java com.azure.ai.speech.transcription.transcriptionoptions.enhancedmode
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));

AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));

// Enhanced mode is automatically enabled
EnhancedModeOptions enhancedMode = new EnhancedModeOptions()
.setTask("transcribe")
.setPrompts(java.util.Arrays.asList("Output must be in lexical format."));

TranscriptionOptions options = new TranscriptionOptions(audioFileDetails)
.setEnhancedModeOptions(enhancedMode);

TranscriptionResult result = client.transcribe(options);

System.out.println("Transcription: " + result.getCombinedPhrases().get(0).getText());
```

### Transcribe with phrase list

You can use a phrase list to improve recognition accuracy for specific terms.

```java com.azure.ai.speech.transcription.transcriptionoptions.phraselist
byte[] audioData = Files.readAllBytes(Paths.get("path/to/audio.wav"));

AudioFileDetails audioFileDetails = new AudioFileDetails(BinaryData.fromBytes(audioData));

PhraseListOptions phraseListOptions = new PhraseListOptions()
.setPhrases(java.util.Arrays.asList("Azure", "Cognitive Services"))
.setBiasingWeight(5.0);

TranscriptionOptions options = new TranscriptionOptions(audioFileDetails)
.setPhraseListOptions(phraseListOptions);

TranscriptionResult result = client.transcribe(options);

result.getCombinedPhrases().forEach(phrase -> {
System.out.println(phrase.getText());
});
```

### Service API versions

The client library targets the latest service API version by default.
The service client builder accepts an optional service API version parameter to specify which API version to communicate.

#### Select a service API version

You have the flexibility to explicitly select a supported service API version when initializing a service client via the service client builder.
This ensures that the client can communicate with services using the specified API version.

When selecting an API version, it is important to verify that there are no breaking changes compared to the latest API version.
If there are significant differences, API calls may fail due to incompatibility.

Always ensure that the chosen API version is fully supported and operational for your specific use case and that it aligns with the service's versioning policy.

## Troubleshooting

### Enable client logging

You can enable logging to debug issues with the client library. The Azure client libraries for Java use the SLF4J logging facade. You can configure logging by adding a logging dependency and configuration file. For more information, see the [logging documentation](https://learn.microsoft.com/azure/developer/java/sdk/logging-overview).

### Common issues

#### Authentication errors

- Verify that your API key is correct
- Ensure your endpoint URL matches your Azure resource region

#### Audio format errors

- Verify your audio file is in a supported format
- Ensure the audio file size is under 250 MB and duration is under 2 hours

### Getting help

If you encounter issues:

- Check the [troubleshooting guide](https://learn.microsoft.com/azure/ai-services/speech-service/troubleshooting)
- Search for existing issues or create a new one on [GitHub](https://github.com/Azure/azure-sdk-for-java/issues)
- Ask questions on [Stack Overflow](https://stackoverflow.com/questions/tagged/azure-java-sdk) with the `azure-java-sdk` tag

## Next steps

- Explore the [samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/transcription/azure-ai-speech-transcription/src/samples) for more examples
- Learn more about [Azure Speech Service](https://learn.microsoft.com/azure/ai-services/speech-service/)
- Review the [API reference documentation][docs] for detailed information about classes and methods

## Contributing


For details on contributing to this repository, see the [contributing guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md).

1. Fork it
1. Create your feature branch (`git checkout -b my-new-feature`)
1. Commit your changes (`git commit -am 'Add some feature'`)
1. Push to the branch (`git push origin my-new-feature`)
1. Create new Pull Request

<!-- LINKS -->
[product_documentation]: https://learn.microsoft.com/azure/ai-services/speech-service/
[docs]: https://azure.github.io/azure-sdk-for-java/
[jdk]: https://learn.microsoft.com/azure/developer/java/fundamentals/
[azure_subscription]: https://azure.microsoft.com/free/

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"AssetsRepo":"Azure/azure-sdk-assets","AssetsRepoPrefixPath":"java","TagPrefix":"java/transcription/azure-ai-speech-transcription","Tag": "java/transcription/azure-ai-speech-transcription_c82ca4aec0"}
16 changes: 16 additions & 0 deletions sdk/transcription/azure-ai-speech-transcription/cspell.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"version": "0.2",
"language": "en",
"words": [
"azuread",
"BYOD",
"BYOS",
"dexec",
"diarization",
"doméstica",
"empleada",
"habitación",
"misrecognized",
"Mundo"
]
}
Loading