- Get API keys: @BotFather, Gemini, Replicate, Sentry, Modal
- Setup DB and Redis. For example Supabase x Postgres and Aiven for Valkey
- Edit
.env
- Set up the Modal Secrets with name
resetlimit-secrets
. OnlyREDIS_URL
from.env
needed.
Run Dockerfile
or compose.yaml
- Apply migrations.
- Run
python main.py
- To reset daily rate limit you must run
modal deploy cron/cron.py
. Otherwise, the daily limit may become inaccurate.
After /start
, you need to set approved to True
for wanted user IDs. Depending on your database, you can use SQL Editor for Supabase x Postgres or any other SQL client for another database.
Example of .env
file:
TG_API_TOKEN="your_api_key"
GEMINI_API_KEY="your_api_key"
REPLICATE_API_TOKEN="your_api_key"
DB_URL="postgresql+driver://user:password@host:port/database"
REDIS_URL="rediss://default:password@host:port"
SENTRY_DSN="your_sentry_dsn"
PROXY=""
WEB_SCRAPE_PROXY=""
LOG_LEVEL="ERROR"
MODAL_TOKEN_ID="your_token"
MODAL_TOKEN_SECRET="your_token_secret"
Pass in an empty string to PROXY
for direct connection.
Or use schema
://username
:password
@proxy_address
:port
For example http://user:password@proxy.com:1234
Don't forget to enabble RLS
if you use Supabase x Postgres.
After completing these steps, you are ready to send youtube.com and castro.fm links to the bot and receive summary.
set_summarizing_model - Choose which model you want to use for summary
set_prompt_strategy - Choose which prompt strategy to use for summary
toggle_transcription - Toggle transcription for summary (fallback on failure)
toggle_yt_transcription - Toggle YouTube transcription
toggle_translation - Toggle translator
set_target_language - Choose which language you want to translate into
- Using
Dockerfile
on any cloud hosting - Using Dokploy or a similar tool and a cost-efficient cloud service like Hetzner
Apply migrations before first run.
python db.py
alembic upgrade head
JavaScript rendering (depricated1)
Many websites have protections against bots, and some content requires JavaScript to be rendered for visibility. To enable JavaScript rendering, I am using Selenium WebDriver, which requires the Chrome browser and ChromeDriver to be installed on the operating system.
Selenium WebDriver
supports only proxy without authorization. For authorized access, use a local proxy server such as tinyproxy or pproxy, or any other proxy server.
Or even solution like Browserbase.
Another approach (by default) is to use a special proxy. This approach requiring special proxy (WEB_SCRAPE_PROXY
) solutions for web scraping, such as ScrapingBee, ScrapingAnt, WebScrapingAPI, scraperapi, or others.
To avoid multiple docker images, I use a Modal for cron jobs to reset the Gemini rate limit. Modal Secrets should include REDIS_URL
.
Docstrings are generated by AI using Google Docstrings Style.
There are a few reasons why providing an audio file might lead to a more detailed and comprehensive summary compared to a text transcript:
-
Contextual Understanding: When processing audio, I can leverage the nuances of speech, such as intonation, emphasis, and pauses, to better understand the speaker's intent and the overall context of the conversation. This contextual understanding helps me identify the main points and supporting arguments more accurately.
-
Speaker Identification and Role: In audio files, I can often distinguish between different speakers and their roles in the conversation. This allows me to attribute specific statements and opinions to the correct individuals, which can be crucial for understanding the dynamics of the discussion.
-
Non-verbal Cues: While text transcripts provide the words spoken, they lack the non-verbal cues that often accompany speech, such as laughter, sighs, or changes in tone. These cues can convey additional information and emotions that can significantly impact the overall meaning of the conversation.
-
Advanced Audio Processing Techniques: My underlying technology can analyze audio files for various features, including speaker identification, sentiment analysis, and topic modeling. These techniques can help me identify key points, summarize the content, and even extract specific information, such as names, dates, or locations.
While text transcripts can provide a solid foundation for understanding the content, they lack the richness and depth of information that can be gleaned from audio files. By incorporating advanced audio processing techniques and considering the broader context of the conversation, I can provide more detailed and insightful summaries when working with audio files.
pyTelegramBotAPI
SQLAlchemy 2.0
Alembic
Google Gen AI SDK
Requests
yt-dlp
beautifulsoup4
Replicate
telegramify_markdown
youtube-transcript-api
Tenacity
Sentry
rush
Telegram Bot API
Docker | Set build-time variables (--build-arg)
Logging Levels, LogRecord attributes
Google Python Style Guide | Docstrings
Conventional Commits, Conventional Commits cheatsheet, gitmoji
Renovate bot, Renovate Configuration Options
crontab guru
Gemini API Cookbook
Uptime stats: Gemini Pro 1.5, Gemini Flash 1.5, Gemini Flash 2.0, Gemini 2.0 Flash Thinking Experimental, Gemini Experimental 1206
PostgreSQL: PostgreSQL on Render, Supabase x Postgres, EdgeDB Cloud
Redis: Redis.io, Upstash x Redis, Aiven for Valkey
TablePlus, DBeaver Community, Valentina Studio
isort, black, mypy, pylint, ruff
Coolify, Appliku, CapRover, Dokku
- Limits. Different models have different limits; while switching models, the limits remain for the default model.
- Another model, Claude 3.5 Sonnet, produces the same output as 8,192 tokens but with only 200k inputs. Or GPT-4o with 16,384 output and 128k input. Prices for Claude 3.5 Sonnet and GPT-4o.
- Gitflow workflow.
- NumPy Docstrings Style Guide | Docstrings.
Footnotes
-
Use solutions for web scraping ↩