Skip to content

This simulates a video and audio aware model using existing LLM vision models. (It takes images and text as input, and generates text as output. Using models like whisper, the text can "speak".

Notifications You must be signed in to change notification settings

AlexD4110/AI-Project

About

This simulates a video and audio aware model using existing LLM vision models. (It takes images and text as input, and generates text as output. Using models like whisper, the text can "speak".

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published