Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.
I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!
Currently, translation support is enabled only from English 🇬🇧!
- Clone this repo
- Install conda
- Create environment with Python 3.10 (for macOS refer to link)
- Activate environment
- Install requirements:
cd path_to_project sh install.sh
- In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
- Download weights from drive, unzip downloaded file into weights folder
- Install ffmpeg
Key | Description |
---|---|
DET_TRESH | Face detection treshtold [0.0:1.0] |
DIST_TRESH | Face embeddings distance treshtold [0.0:1.0] |
HF_TOKEN | Your HuggingFace token (see Installation) |
USE_ENHANCER | Do we need to improve faces using GFPGAN? |
Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Bengali, Bulgarian, Catalan, Cebuano, Chichewa, Chinese, Dutch, English, Finnish, French, German, Greek, Gujarati, Haitian creole, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Javanese, Kannada, Kazakh, Khmer, Korean, Kyrgyz, Lao, Latin, Latvian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Odia, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Shona, Somali, Spanish, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Welsh, Yoruba
- Activate your environment:
conda activate your_env_name
- Сd to project path:
cd path_to_project
At the root of the project there is a translate script that translates the video you set.
- video_filename - the filename of your input video (.mp4)
- output_language - the language to be translated into. Provided here (you can also find it in my code)
- output_filename - the filename of output video (.mp4)
python translate.py video_filename output_language -o output_filename
I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person.
- voice_filename - the filename of your speech (.wav)
- video_filename - the filename of your input video (.mp4)
- output_filename - the filename of output video (.mp4)
python speech_changer.py voice_filename video_filename -o output_filename
- Detecting scenes (PySceneDetect)
- Face detection (yolov8-face)
- Reidentification (deepface)
- Speech enhancement (MDXNet)
- Speakers transcriptions and diarization (whisperX)
- Text translation (googletrans)
- Voice cloning (TTS)
- Lip sync (lipsync)
- Face restoration (GFPGAN)
- [Need to fix] Search for talking faces, determining what this person is saying
Note that this example was created without GFPGAN usage!
Destination language | Source video | Output video |
---|---|---|
🇷🇺 (Russian) |
Contributions are welcomed! I am very glad that so many people are interested in my project. I will be happy to see the pull requests. In the future, all contributors will be included in the list that will be displayed here!
- Fully GPU support
- Multithreading support (optimizations)
- Detecting talking faces (improvement)
- Tested on macOS
⚠️ The project is under development!