Skip to content

Bot to summarize video from youtube.com and podcasts from castro.fm.

License

Notifications You must be signed in to change notification settings

vasiliadi/ai-summarizer-telegram-bot

Repository files navigation

AI Summarizer - telegram bot

Python PostgreSQL Redis Codacy Badge FOSSA Status Poetry Ruff

Usage

General settings

  1. Get API keys: @BotFather, Gemini, Replicate, Sentry, Modal
  2. Setup DB and Redis. For example Supabase x Postgres and Aiven for Valkey
  3. Edit .env
  4. Set up the Modal Secrets with name resetlimit-secrets. Only REDIS_URL from .env needed.

Dockerfile

Run Dockerfile or compose.yaml

Without docker

  1. Apply migrations.
  2. Run python main.py
  3. To reset daily rate limit you must run modal deploy cron/cron.py. Otherwise, the daily limit may become inaccurate.

After start

After /start, you need to set approved to True for wanted user IDs. Depending on your database, you can use SQL Editor for Supabase x Postgres or any other SQL client for another database.

.env

Example of .env file:

TG_API_TOKEN="your_api_key"
GEMINI_API_KEY="your_api_key"
REPLICATE_API_TOKEN="your_api_key"
DB_URL="postgresql+driver://user:password@host:port/database"
REDIS_URL="rediss://default:password@host:port"
SENTRY_DSN="your_sentry_dsn"
PROXY=""
WEB_SCRAPE_PROXY=""
LOG_LEVEL="ERROR"
MODAL_TOKEN_ID="your_token"
MODAL_TOKEN_SECRET="your_token_secret"

Pass in an empty string to PROXY for direct connection.
Or use schema://username:password@proxy_address:port
For example http://user:password@proxy.com:1234

Don't forget to enabble RLS if you use Supabase x Postgres.

After completing these steps, you are ready to send youtube.com and castro.fm links to the bot and receive summary.

List of commands for BotFather

set_summarizing_model - Choose which model you want to use for summary
set_prompt_strategy - Choose which prompt strategy to use for summary
toggle_transcription - Toggle transcription for summary (fallback on failure)
toggle_yt_transcription - Toggle YouTube transcription
toggle_translation - Toggle translator
set_target_language - Choose which language you want to translate into

Deploy

  • Using Dockerfile on any cloud hosting
  • Using Dokploy or a similar tool and a cost-efficient cloud service like Hetzner

For development

Migrations

Apply migrations before first run.

python db.py
alembic upgrade head

JavaScript rendering (depricated1)

Many websites have protections against bots, and some content requires JavaScript to be rendered for visibility. To enable JavaScript rendering, I am using Selenium WebDriver, which requires the Chrome browser and ChromeDriver to be installed on the operating system.

Selenium WebDriver supports only proxy without authorization. For authorized access, use a local proxy server such as tinyproxy or pproxy, or any other proxy server. Or even solution like Browserbase.

Another approach (by default) is to use a special proxy. This approach requiring special proxy (WEB_SCRAPE_PROXY) solutions for web scraping, such as ScrapingBee, ScrapingAnt, WebScrapingAPI, scraperapi, or others.

Remote functions

To avoid multiple docker images, I use a Modal for cron jobs to reset the Gemini rate limit. Modal Secrets should include REDIS_URL.

Docstrings

Docstrings are generated by AI using Google Docstrings Style.

Audio vs Text Summaries (AI answer)

There are a few reasons why providing an audio file might lead to a more detailed and comprehensive summary compared to a text transcript:

  1. Contextual Understanding: When processing audio, I can leverage the nuances of speech, such as intonation, emphasis, and pauses, to better understand the speaker's intent and the overall context of the conversation. This contextual understanding helps me identify the main points and supporting arguments more accurately.

  2. Speaker Identification and Role: In audio files, I can often distinguish between different speakers and their roles in the conversation. This allows me to attribute specific statements and opinions to the correct individuals, which can be crucial for understanding the dynamics of the discussion.

  3. Non-verbal Cues: While text transcripts provide the words spoken, they lack the non-verbal cues that often accompany speech, such as laughter, sighs, or changes in tone. These cues can convey additional information and emotions that can significantly impact the overall meaning of the conversation.

  4. Advanced Audio Processing Techniques: My underlying technology can analyze audio files for various features, including speaker identification, sentiment analysis, and topic modeling. These techniques can help me identify key points, summarize the content, and even extract specific information, such as names, dates, or locations.

While text transcripts can provide a solid foundation for understanding the content, they lack the richness and depth of information that can be gleaned from audio files. By incorporating advanced audio processing techniques and considering the broader context of the conversation, I can provide more detailed and insightful summaries when working with audio files.

Docs

pyTelegramBotAPI
SQLAlchemy 2.0
Alembic
Google Gen AI SDK
Requests
yt-dlp
beautifulsoup4
Replicate
telegramify_markdown
youtube-transcript-api
Tenacity
Sentry
rush

Telegram Bot API
Docker | Set build-time variables (--build-arg)
Logging Levels, LogRecord attributes
Google Python Style Guide | Docstrings
Conventional Commits, Conventional Commits cheatsheet, gitmoji
Renovate bot, Renovate Configuration Options
crontab guru
Gemini API Cookbook
Uptime stats: Gemini Pro 1.5, Gemini Flash 1.5, Gemini Flash 2.0, Gemini 2.0 Flash Thinking Experimental, Gemini Experimental 1206

Cloud DBs

PostgreSQL: PostgreSQL on Render, Supabase x Postgres, EdgeDB Cloud
Redis: Redis.io, Upstash x Redis, Aiven for Valkey

SQL Clients

TablePlus, DBeaver Community, Valentina Studio

Linters and Checkers

isort, black, mypy, pylint, ruff

Error suppression

ruff, pylint, mypy

Easy deploy

Coolify, Appliku, CapRover, Dokku

Logs

Logtail, Papertrail

Known problems

  • Limits. Different models have different limits; while switching models, the limits remain for the default model.

Possible improvements

Footnotes

  1. Use solutions for web scraping