Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSchroedl/lipsync pipeline feature #120

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

pschroedl
Copy link
Collaborator

This new route at ‘/lipsync’ takes either a simple text input or an audio file along with a static image, producing an mp4 of lipsync’ed audio and video.

An optional parameter return_frames will return single frames following the schema used in the image-to-video pipeline.

If text is supplied instead of an audio file, FastSpeech2Conformer is used for TTS.

The text input and mp4 output options differ from the bounty requirements solely for ease of demo ( and debugging ) purposes and can quickly be removed if desired.

At the time of writing, a demo server is running at http://204.12.245.134:8002/docs#/default/lipsync

( Disclaimer - long audio or text sequences will OOM on the GPU and may not gracefully recover )
Real3DPortrait https://github.com/yerfor/Real3DPortrait is utilized for the audio to video synchronization pipeline, and a purpose built Conda environment is configured on the host - isolating the majority of the requirements.

Standing apart from this majority is one particular requirement that needed to be installed at the OS level. In lieu of bumping the version of our Ubuntu base image 20.04 → 22.04, I’ve created a separate dockerfile which builds the necessary version from source.

Lipsync pipeline specific instructions for running and debugging can be found at cmd/lipsync/README.md

The approach taken here was a bit atypical ( to pull in an entire repo to utilize for a pipeline ), but it was a personal goal was to make some improvement to developer velocity on the AI Pipeline. The changes in this PR establish a pattern that enables devs to test out and prototype new pipelines with and test existing open-source implementations without potentially conflicting or hard-to-resolve dependencies.

Further work would include implementing lower level inference logic from scratch to be able to more finely control model selection and loading/unloading/caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant