Stream AI content to the world
Before running the application, make sure you have the following prerequisites installed:
- Node.js (version 18.16 or higher) (run
nvm use
if you use NVM or similar tools) - Twitch account with stream key for streaming
- OBS > 28.0.0 because it includes the obs-websocket plugin
- Clone the repo to your computer:
git clone https://github.com/failfa-st/strai.git
- Go into the strai folder
cd strai
- Install dependencies
npm i
- Create the .env based on .env.example
cp .env.example .env
- Fill out the .env
# You get this password in the next step "OBS Setup"
OBS_WEBSOCKET_SERVER_PASSWORD=
# The full path to the video that gets looped as a default when nothing is visible
OBS_DEFAULT_VIDEO=
You need to create the following base setup:
- A scene named
default
with an Sources > Media Source nameddefaultVideo
with the following configuration- Local file: true
- Loop: true
- Restart playback when source becomes available: true
- Use hardware decoding when available: true
- Show nothing when playback ends: false
- Close file when inactive: false
- Leave all other settings on their defaults
- A scene named
queue
with - A scene named
stream
with- Sources > Group named
setup
- Sources > Scene and select
default
and put it into the groupsetup
- Sources > Scene and select
queue
and put it into the groupsetup
- Make sure that the scenes inside the group are in this order:
queue
default
- Sources > Group named
Then you can also configure the Websocket-Server:
- Tools > WebSocket Server Settings
- Set the server port to 4455, as this is the default here
- Enable Authentication: true
- Click on the "Show Connect Info" to get the Server Password
graph TB
A[Stable Diffusion WebUI] -->|Generates Picture| D[SadTalker]
B[Twitch Chat] -->|Extracts Messages| E{System Prompt to OpenAI GPT}
E -->|Generates Text| F[Bark]
F -->|Generates WAV File| D
D -->|Generates MP4 Video| G[OBS via obs-websocket-js]
G -->|Streams Video| H[Twitch]
This diagram describes the following steps:
Stable Diffusion WebUI
generates a picture of a person which is used bySadTalker
.- Messages are extracted from
Twitch Chat
and transformed into an API call toOpenAI GPT
using strong system prompt that represents a specific persona. - The system prompt to
OpenAI GPT
generates a response to the chat message which is fed toBark
. Bark
generates a WAV file based on the provided text.SadTalker
combines the picture fromStable Diffusion WebUI
and the WAV file fromBark
to generate an MP4 video, which contains a face that speaks- The generated MP4 video is then input into
OBS
usingobs-websocket-js
. OBS
streams the video toTwitch
.
- Enable the
defaultVideo
- This makes sure that we have a loop constantly running when nothing is happening
- When
OBSRemoteControl.addVideo
is called:- Add the video to a video queue
- This creates a new Media Source inside of
queue
using the same transform settings that were used to position thedefaultVideo
. It will also set the audio output to "Monitor and Output" so that the video's audio is hearable in the stream
- This creates a new Media Source inside of
- Add the video to a video queue
- The oldest video from the video queue will be played
- Once the video is playing in OBS, it will be removed from the video queue
- Once the video playback ended in OBS, the video will be removed from OBS