PSchroedl/lipsync pipeline feature #120

pschroedl · 2024-07-11T03:47:30Z

This new route at ‘/lipsync’ takes either a simple text input or an audio file along with a static image, producing an mp4 of lipsync’ed audio and video.

An optional parameter return_frames will return single frames following the schema used in the image-to-video pipeline.

If text is supplied instead of an audio file, FastSpeech2Conformer is used for TTS.

The text input and mp4 output options differ from the bounty requirements solely for ease of demo ( and debugging ) purposes and can quickly be removed if desired.

At the time of writing, a demo server is running at http://204.12.245.134:8002/docs#/default/lipsync

( Disclaimer - long audio or text sequences will OOM on the GPU and may not gracefully recover )
Real3DPortrait https://github.com/yerfor/Real3DPortrait is utilized for the audio to video synchronization pipeline, and a purpose built Conda environment is configured on the host - isolating the majority of the requirements.

Standing apart from this majority is one particular requirement that needed to be installed at the OS level. In lieu of bumping the version of our Ubuntu base image 20.04 → 22.04, I’ve created a separate dockerfile which builds the necessary version from source.

Lipsync pipeline specific instructions for running and debugging can be found at cmd/lipsync/README.md

The approach taken here was a bit atypical ( to pull in an entire repo to utilize for a pipeline ), but it was a personal goal was to make some improvement to developer velocity on the AI Pipeline. The changes in this PR establish a pattern that enables devs to test out and prototype new pipelines with and test existing open-source implementations without potentially conflicting or hard-to-resolve dependencies.

Further work would include implementing lower level inference logic from scratch to be able to more finely control model selection and loading/unloading/caching.

pschroedl added 11 commits July 9, 2024 04:42

add lipsync pipeline and dependencies

7e6ab09

revert dockerfile changes, add Dockerfile.lipsync

4ab2768

fix conda activation

193a032

make text input optional, add file upload

7a3b969

handle optional audio file

3e47d6b

fix parameter mismatch, add error handling

8300e24

fix audio file path

16a571e

update README.md with usage/debug info

0e93931

add return_frames option, move helpers to utils

4b53a11

remove unused checkpoints from dl.sh

2cdfe31

cleanup

0c1d613

pschroedl requested a review from rickstaa as a code owner July 11, 2024 03:47

rickstaa mentioned this pull request Jul 12, 2024

Add lip sync pipeline [50 LPT] livepeer/bounties#35

Closed

rickstaa force-pushed the main branch 3 times, most recently from cd1feb4 to 0d03040 Compare July 16, 2024 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSchroedl/lipsync pipeline feature #120

PSchroedl/lipsync pipeline feature #120

pschroedl commented Jul 11, 2024

PSchroedl/lipsync pipeline feature #120

Are you sure you want to change the base?

PSchroedl/lipsync pipeline feature #120

Conversation

pschroedl commented Jul 11, 2024