An Open Source API alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
podcastfy.mp4
Paper | Python Package | CLI | REST API | Web App | Feedback
Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, images, YouTube videos, as well as user provided topics.
Unlike closed-source UI-based tools focused primarily on research synthesis (e.g. NotebookLM β€οΈ), Podcastfy focuses on open source, programmatic and bespoke generation of engaging, conversational content from a multitude of multi-modal sources, enabling customization and scale.
This sample collection was generated using this Python Notebook.
Sample 1: Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu
senecio.mp4
Sample 2: The Great Wave off Kanagawa, 1831 (Hokusai) and Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi)
japan.mp4
Sample 3: Pop culture icon Taylor Swift and Mona Lisa, 1503 (Leonardo da Vinci)
taylor.mp4
Audio | Description | Source |
---|---|---|
souza.mp4 |
Personal Website | Website |
Audio (longform=True ) |
Lex Fridman Podcast: 5h interview with Dario Amodei Anthropic's CEO | Youtube |
Audio (longform=True ) |
Benjamin Franklin's Autobiography | Book |
Language | Content Type | Description | Audio | Source |
---|---|---|---|---|
French | Website | Agroclimate research information | Audio | Website |
Portuguese-BR | News Article | Election polls in SΓ£o Paulo | Audio | Website |
- Python 3.11 or higher
$ pip install ffmpeg
(for audio processing)
-
Install from PyPI
$ pip install podcastfy
-
Set up your API keys
from podcastfy.client import generate_podcast
audio_file = generate_podcast(urls=["<url1>", "<url2>"])
python -m podcastfy.client --url <url1> --url <url2>
Podcastfy offers a range of customization options to tailor your AI-generated podcasts:
- Customize podcast conversation (e.g. format, style, voices)
- Choose to run Local LLMs (156+ HuggingFace models)
- Set other Configuration Settings
- Generate conversational content from multiple sources and formats (images, text, websites, YouTube, and PDFs).
- Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.
- Customize transcript and audio generation (e.g., style, language, structure).
- Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).
- Leverage local LLMs for transcript generation for increased privacy and control.
- Integrate with advanced text-to-speech models (OpenAI, Google, ElevenLabs, and Microsoft Edge).
- Provide multi-language support for global content creation.
- Integrate seamlessly with CLI and Python packages for automated workflows.
"Loving this initiative and the best I have seen so far especially for a 'non-techie' user."
"Love that you casually built an open source version of the most popular product Google built in the last decade"
"Your library was very straightforward to work with. You did Amazing work brother π"
"I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an incredible job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"
- Released new Multi-Speaker TTS model (is it the one NotebookLM uses?!?)
- Generate short or longform podcasts
- Generate podcasts from input topic using grounded real-time web search
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
See CHANGELOG for more details.
This software is licensed under Apache 2.0. See instructions if you would like to use podcastfy in your software.
We welcome contributions! See Guidelines for more details.
-
Content Creators can use
Podcastfy
to convert blog posts, articles, or multimedia content into podcast-style audio, enabling them to reach broader audiences. By transforming content into an audio format, creators can cater to users who prefer listening over reading. -
Educators can transform lecture notes, presentations, and visual materials into audio conversations, making educational content more accessible to students with different learning preferences. This is particularly beneficial for students with visual impairments or those who have difficulty processing written information.
-
Researchers can convert research papers, visual data, and technical content into conversational audio. This makes it easier for a wider audience, including those with disabilities, to consume and understand complex scientific information. Researchers can also create audio summaries of their work to enhance accessibility.
-
Accessibility Advocates can use
Podcastfy
to promote digital accessibility by providing a tool that converts multimodal content into auditory formats. This helps individuals with visual impairments, dyslexia, or other disabilities that make it challenging to consume written or visual content.