Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common types for text-to-speech #1496

Open
habuma opened this issue Oct 8, 2024 · 2 comments · May be fixed by #1518
Open

Common types for text-to-speech #1496

habuma opened this issue Oct 8, 2024 · 2 comments · May be fixed by #1518

Comments

@habuma
Copy link
Member

habuma commented Oct 8, 2024

Similar to what I suggested in #1478, it would be great if text-to-speech had a set of common types. The SpeechModel interface, as well as SpeechPrompt, SpeechResponse, and StreamingSpeechModel feel like what I'd expect with such types, but they are currently delivered in the OpenAI module. Even though OpenAI is the only implementation, it feels like those types should be in core with the implementations and OpenAI-specific extensions in the OpenAI module.

Moreover, while SpeechPrompt feels like it should be in core, it carries OpenAiAudioSpeechOptions. Perhaps there should be a more generic SpeechOptions that is carried by SpeechPrompt, with OpenAiAudioSpeechOptions being an extension of SpeechOptions.

Altogether, this would not only make the types more consistent with how the types for chat and other models are structured, it also sets the stage for additional text-to-speech implementations should more APIs that offer that be added to Spring AI.

@ThomasVitale
Copy link
Contributor

I like the suggestion and I'm available to work on this. I'll have a PR ready soon.

@mudabirhussain
Copy link
Contributor

Similar to what I suggested in #1478, it would be great if text-to-speech had a set of common types. The SpeechModel interface, as well as SpeechPrompt, SpeechResponse, and StreamingSpeechModel feel like what I'd expect with such types, but they are currently delivered in the OpenAI module. Even though OpenAI is the only implementation, it feels like those types should be in core with the implementations and OpenAI-specific extensions in the OpenAI module.

Moreover, while SpeechPrompt feels like it should be in core, it carries OpenAiAudioSpeechOptions. Perhaps there should be a more generic SpeechOptions that is carried by SpeechPrompt, with OpenAiAudioSpeechOptions being an extension of SpeechOptions.

Altogether, this would not only make the types more consistent with how the types for chat and other models are structured, it also sets the stage for additional text-to-speech implementations should more APIs that offer that be added to Spring AI.

What is the reason for having SpeechResponse in the core, since it is already an implementation of ModelResponse, which belongs to the core family? Most of the classes or interfaces mentioned above are already implementations or extensions of existing core family classes or interfaces.

ThomasVitale added a commit to ThomasVitale/spring-ai that referenced this issue Oct 9, 2024
* Consolidate SpeechModel APIs into the spring-ai-core module, make it null-safe and covered by unit tests.
* Refactor OpenAiSpeechModel APIs to implement the new consolidated APIs.
* Delete leftover ImageResponseMetadata class in the spring-ai-openai module.

Fixes spring-projectsgh-1496
@ThomasVitale ThomasVitale linked a pull request Oct 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants