Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate SpeechModel APIs #1518

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ThomasVitale
Copy link
Contributor

  • Consolidate SpeechModel APIs into the spring-ai-core module, make it null-safe and covered by unit tests.
  • Refactor OpenAiSpeechModel APIs to implement the new consolidated APIs.
  • Delete leftover ImageResponseMetadata class in the spring-ai-openai module.

Fixes gh-1496

* Consolidate SpeechModel APIs into the spring-ai-core module, make it null-safe and covered by unit tests.
* Refactor OpenAiSpeechModel APIs to implement the new consolidated APIs.
* Delete leftover ImageResponseMetadata class in the spring-ai-openai module.

Fixes spring-projectsgh-1496
@ThomasVitale
Copy link
Contributor Author

Once this PR is merged, I'll have 2 followups PRs: one to update the documentation and one to add observability instrumentation to the OpenAiSpeechModel.

@markpollack markpollack self-assigned this Oct 11, 2024
@markpollack markpollack added this to the 1.0.0-M4 milestone Oct 11, 2024
@markpollack
Copy link
Member

markpollack commented Oct 11, 2024

The biggest concern for me is how portable are the abstractions, we need at least two impls to ensure that we have the correct abstraction before moving into main. Also, most of the changes seem sort of "shallow", e.g. just changing the name of the class and the types, but aside from the name change they have the same functionality as the 'non speach' core api.

Thoughts?

@ThomasVitale
Copy link
Contributor Author

I also thought about that, so I was conservative with the changes. The main benefit I see in having the Speech APIs in Core is that it's simpler to demonstrate and try out new implementations, or even for users to implement their own custom ones. Mainly for the semantics given by the interfaces and the few conventions they have.

But I don't have strong feeling about it, so I'd be fine parking this for now. I was looking into adding observability for speech models, but we can wait for that (I wouldn't do it inside the OpenAI module so to keep consistency with the rest of the observability implementation).

@habuma do you have any thoughts about this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Common types for text-to-speech
2 participants