The official Python API for Deeptune. Deeptune brings the most human-like text to speech and voice cloning to your project in only a few lines of code.
Check out our documentation.
pip install deeptune
Instantiate and use the client with the following:
from deeptune.client import Deeptune
from deeptune.utils import play
client = Deeptune(
api_key="YOUR_API_KEY",
)
audio = client.text_to_speech.generate(
text="Wow, Deeptune's text to speech API is amazing!",
voice="d770a0d0-d7b0-4e52-962f-1a41d252a5f6",
)
play(audio)
If you prefer to manage voices on your own, you can use your own audio file as a reference for the voice clone.
from deeptune.client import Deeptune
from deeptune.utils import play
client = Deeptune(
api_key="YOUR_API_KEY",
)
audio = client.text_to_speech.generate_from_prompt(
text="Wow, Deeptune's text to speech API is amazing!",
prompt_audio="https://deeptune-demo.s3.amazonaws.com/Michael.wav",
)
play(audio)
import base64
from deeptune.client import Deeptune
from deeptune.utils import play
client = Deeptune(
api_key="YOUR_API_KEY",
)
# Open the file and read its contents as bytes
with open("Michael.wav", "rb") as audio_file:
audio_bytes = audio_file.read()
# Encode the bytes to base64
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
audio = client.text_to_speech.generate_from_prompt(
text="Wow, Deeptune's text to speech API is amazing!",
prompt_audio=f"data:audio/wav;base64,{audio_base64}",
)
play(audio)
You can also store and manage voices inside of Deeptune.
# Get all available voices
voices = client.voices.list()
print(voices)
# Get a specific voices
voice = client.voices.get(voice_id="d770a0d0-d7b0-4e52-962f-1a41d252a5f6")
print(voice)
# Create a new cloned voice
voice = client.voices.create(
name="Cool Name",
file=open("./Michael.wav", "rb")
)
print(voice)
# Update an existing voice
voice = client.voices.update(
voice_id=voice.id,
name="Updated Name",
file=open("./Michael.wav", "rb"),
)
print(voice)
# Delete an existing voice
client.voices.delete(voice.id)
The generate
and generate_from_prompt
endpoints return an iterator of bytes. Make sure to get all of the bytes before writing as demonstrated below.
audio = client.text_to_speech.generate(
text="Wow, Deeptune's text to speech API is amazing!",
voice="d770a0d0-d7b0-4e52-962f-1a41d252a5f6",
)
audio_bytes = b"".join(audio)
# Now, you can save however you'd like
with open("output.mp3", "wb") as audio_file:
audio_file.write(audio_bytes)
The also has inbuilt play
, save
, and stream
utility methods. Under the hood, these methods use ffmpeg and mpv to play audio streams.
from deeptune.utils import play, save, stream
# plays audio using ffmpeg
play(audio)
# streams audio using mpv
stream(audio)
# saves audio to file
save(audio, "my-file.mp3")
The SDK also exports an async
client so that you can make non-blocking calls to our API.
from deeptune.client import Deeptune
from deeptune.utils import play
client = AsyncDeeptune(
api_key="YOUR_API_KEY",
)
audio = await client.text_to_speech.generate_from_prompt(
text="Wow, Deeptune's text to speech API is amazing!",
voice="d770a0d0-d7b0-4e52-962f-1a41d252a5f6",
)
play(audio)
While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!
On the other hand, contributions to the README are always very welcome!