WIP: this project is in progress
- Aws lambda from deploying
- Kubeflow and Accelerate for distributed training
A neural net that generates the audio from predefined genres.
The remixer
model is inspired
by Spectrogram Diffusion, where
VQ-VAE is applied to capture the repetitive patterns in music, e.g. chord progression.
class_id of genre is provided to the transformer2D.
$$ L_{\text{reconstruction}} = \frac{1}{N} \sum_{i=1}^{N} \left| x^{(i)} - \hat{x}^{(i)} \right|^2
L_{\text{KL}} = D_{\text{KL}}(q(z|x) \parallel p(z)) = -\frac{1}{2} \sum_{j=1}^{J} \left(1 + \log((\sigma_j)^2) - ( \mu_j)^2 - (\sigma_j)^2\right)
L_{\text{VAE}} = L_{\text{reconstruction}} + \beta L_{\text{KL}} $$
the encoded latents of style
song are passed to the transformer2D as the guidance. The loss function is composed of
content loss, style loss and variation loss, inspired by NST:
Sample a Gaussian noise -> class label -> Diffusion -> Decode
Encode the original and style audio -> pass to Transformer2D -> Diffusion -> Decode
Simple select a pre-defined genre from the 'Genre' dropdown list.
Provide a original
and style
at the same time.
See deploy.ipynb
To set up the server for inference, we create an AWS Lambda function for the backend:
python deploy/aws_lambda.py