going forward + question #28

tig3rmast3r · 2024-03-26T23:08:09Z

Hi Hugo, this project rocks i'm having lot of fun, this is the best AI based audio tool available right now, at least for what i'm looking for.

Would like to share where i'm going from this awesome project..
i've made an app in c# that takes care of keeping loops overtime and send generation presets to gradio like you did with unloop, losing hours just listening to generations and i'm making tons of brand new audio loops too!!
working also on a vst plug-in that send generated wavs into daw combined with demucs for separation, made also a very primitive liveset using 3 c# apps simultaneously in realtime and sending demucsed streams to vst -> reaktor and mixing them, what a blast!!
Not sure if there's interest on what i'm doing, i may share the projects but i'm nothing special with coding, i just know how to use chatgpt properly :)

this is the c# app

this is the liveset setup using vst + bidule + reaktor (+ ipad and midi controller)

930707408-VID-20231230-WA0001.mp4

now for the question, is there a way to start a new training and make a bigger size model ?
is it an easy task ? i mean i have no idea what i have to change in the code to set model size for training like 2x or 3x bigger, i only did (huge) fine-tunings till now. would like to test starting training from scratch and making a bigger model too to see what happens :)

thanks a lot!

hugofloresgarcia · 2024-04-04T19:31:38Z

Hi @tig3rmast3r,

This is insane!! I wanted to say this looks and sounds super cool!
Would love to hear a full AI techno set done with this tool if you'd have one to share!

I'm really happy to see vampnet in a full creative interface like this one.

Training from scratch requires a large dataset (50k hours of audio, more or less), and enough GPUs to fit a batch size of ~32 for the duration of audio context you'd like to train for. You can have a look a the settings used to train the model in conf/vampnet.yml.

tig3rmast3r · 2024-04-05T12:00:45Z

Hi Hugo,
glad you liked it, i played with that a lot but never found the time to do a good recording, i'm planning to make a youtube video sooner or later.
About the training, i've already carefully selected 10k+ chunks that should be like 28 hours, is a very personal model as it includes most of my discography for the first 30%, i've used it already for this project as fine-tuning.
I did around 300 ephocs for fine-tuning (batch_size*iterations/n. of chunks), with an rtx 4090 it took around 90 hours total (70 coarse+ 20 c2f) i stopped at 300 because the learning rate was dropping very quickly.
i've already had a look to the vampnet.conf, what is the parameter that define the model size ?
is just VampNet.embedding_dim: 1280 ?
i mean if i want to make a double size pth just doubling this value is enough or do i have to adjust something else ?
i guess that as the model is aimed for just techno/tec-house 28 hours may be enough...
with rtx 4090 i can't go over batchsize 5 because with 6 once it saves the first checkpoint it will go over 24gb ram

hugofloresgarcia · 2024-04-09T14:53:56Z

yeah, doubling that value could work. You could also try changing the number of layers and heads, though that might require a bit more finetuning to get it working.

tig3rmast3r · 2024-04-20T16:15:36Z

is it normal that training with identical parameters and dataset on linux gives different results than windows?
i'm telling this because i trained a model for a few days and i'm still getting very bad results, i run for 334600 iters with batch 3, that is 100 epochs as i have 10038 chunks.
i used linux with torch.compile with pytorch 2.1.2 and 118, i did the same with c2f.
i used embedding 1914, head 22 and layers 22, while for c2f i lowered to 1800,20,20.
it's still far from being good so i'm wondering if there's someting wrong in linux, like for example is wrongly reading the wav files.
So for testing purpouses i did a quick training with a few chunks and i did the same in windows so the models should be identical and i've discovered that linux ones are usually missing higher frequencies like they are pitched down, i've attached 2 wavs generated with same seed without mask. I will do more tests in windows with longer train but it looks there's something wrong on my linux setup.
How can i make sure it's reading files correctly? they are all wavs mono 16bit pcm 44100hz.
I'm sure Windows is ok cause i did many fine-tunings and they sounds great.
note that i don't use torch.compile in windows as is not available and i'm on pytorch 2.1.0 but i tried even pytorch 2.3.0 in linux with cuda 12.1 and the models appears almost identical to 2.1.2 with 11.8.
do you have any clue ?

here's the test wavs.zip comparison between the linux and windows trainings.

thanks

tig3rmast3r · 2024-04-21T14:01:35Z

i def have a problem in linux
did a longer training tonight in windows with same values as linux.
this is only 10 epochs for coarse + 20 for c2f, versus 100 + 100 from linux.
now i got what i was expecting, would like to sort out my linux issue so i can rent a runpod for the training, any idea would be really appreciated. thanks
i attached 2 examples, one pair without mask, same seed, and another pair with mask with another seed.
testwav.zip

tig3rmast3r · 2024-04-21T15:28:40Z

i did a quick test and looks that the problem is with torch.compile command.
removing torch.compile from train.py solved the problem in linux.
Do you have a specific combination of pytorch and cuda that you have tested with torch.compile and you know it's working ?
i tested so far:
2.1.2 with cu11.8 = bad training
2.3.0 with cu12.1 (dev build) = bad training
2.2.2 with cu12.1 = error (missing 1 required positional argument: 'dim')
hope this helps

EDIT: i did more test and unfortunately the problem is not with torch.compile, while i've noticed that with torch.compile i get different results both results are bad.
i've also rented a runpod istance and run for 1 entire day (6 x RTX4000 ADA, python3.9 without torch.compile and pytorch 2.2.0 cu12.1) and i got same results so it's not related to my config, it looks a general issue with linux, i can print my installed conda and pip if you need more info. thanks

tig3rmast3r · 2024-04-25T14:36:45Z

i've finally found a working combination, honestly i haven't found the root issue but i can use linux now!
i tried both home and i'm actually training on vast.ai with 4x 4090, no issues so far (not using torch.compile but at least multi gpu is working too)
I got a pip list for pc and tried to make it as much similar as possible in linux
here are all the combinations that works, i have applied the fix on all configs to avoid bad audio results (more info below)

_	python 3.11.4, pytorch 2.0.1, cu11.8	python 3.11.4, pytorch 2.1.2, cu11,8	python 3.9.x or 3.11.4, pythorch 2.2.x, cu11.8 or 12.1	python 3.11.4, pytorch 2.3.0cu12.1	python 3.9.17, pytorch 2.3.0cu12.1	python 3.10.14, pytorch 2.3.0cu12.1	python 3.10.14, pytorch 2.0.1cu11.8
Single GPU	working	working	untested	untested	working	untested	working
Single GPU + torch.compile	working	working	error	error	working	untested	working
Multi GPU	working	working	untested	untested	working	untested	working
Multi GPU + torch.compile	incompatible	error	error	error	stuck@"starting training loop"	working (sometimes)	working

i've trained several combinations to understand the impact of torch.compile and python versions on speed and quality
Quality is similar accross all tests as expected (needed to make 50 ephocs to minimize randomness)
Speed report:

_	Windows python 3.11.4, pytorch 2.1.1, cu11.8 no torch.compile	Linux python 3.11.4, pytorch 2.1.2, cu11,8 no torch.compile	Linux python 3.11.4, pytorch 2.1.2, cu11,8	Linux python 3.9.17, pytorch 2.3.0cu12.1 no torch.compile	Linux python 3.9.17, pytorch 2.3.0cu12.1
Speed	base	6.6% faster	16% faster	6.1% faster	16.5% faster

i've finally found a working torch.compile config for multi-gpu, using python 3.10 and latest pytorch!! tested on 4 x rtx4090
EDIT: i get errors during startup sometimes with 2.3.0cu12.1, no problems with 2.0.1cu11.8

About the trick to fix bad audio i have attached a zip containing the following file
-file "new" the working combination from pc (edited)
-3 examples, before - requirements - after for a 3.9.17 setup with 2.3.0cu12.1
-a compare.py to create the requirements file based on the windows working file (more info below)

Basically one of the modules inside the requirements file is causing bad audio. i still haven't identified which one.

EDIT: i have updated my fork with installation instructions to get this working on Windows and linux (single and multi gpu) succesfully. i will no longer update this thread and i've removed installation instructions.

i've used the compare trick on clean conda envs both locally and on vast.ai containers.

hopefully you will find the root cause so we can define the version during pip install -e ./vampnet (or update the code to work with the latest version of whatever it is)

Here's the zip files.zip

Lastly, while i was testing i've found the time to report some "TimeToTrain" values, it may help finding the perfect server to train and save some (or a lot) of $$
Here are all the tested server, i used mostly runpod.io
i've switched now to vast.ai as is much cheaper in most cases
TimeToTrain is based on n. of Ephocs, so larger batch size have fewer iteractions.
basically (batch_size x iteractions) is equal for all tests

model	ram	cuda	tensor	freq	tdp	gpu	batch	time to train	fp32bench (single GPU)	vastai tflops	vastai dlp min	vastai dlp max	NOTE
RTX 4090	24	16384	512	2235	450	4	16	178	82,58	327,00	260,00	350,00	cheap server, may be better
H100 80 smx5	80	16896	528	1590	700	1	16	224	66,91	107,00	580,00	684,00
RTX 4000 ADA	20	6144	192	1500	130	6	18	241	26,73
RTX 4090	24	16384	512	2235	450	2	6	264	82,58	162,00	175,00	212,00
L40	48	18176	568	735	300	2	12	290	90,52	144,00	231,00	231,00
RTX A4000	16	6144	192	735	140	8	16	295	19,17
RTX 4090	24	16384	512	2235	450	1	4	379	82,58				My home pc (11700k@5Ghz)
RTX 6000 ADA	48	18176	568	915	300	1	6	450	91,06	81,00	135,00	135,00	Multi-gpu not working
RTX 4000 ADA sff	20	6144	192	720	70	4	8	513	19,17	41,00	31,00	42,00
RTX A5000 (SFF?)	24	6144	192	900	150	6	18	545	19,35				strange model with 150w tdp and lower perfs
RTX A5000	24	8192	256	1170	230	2	6	572	27,77	55,00	55,00	69,00
A100 80 PCIe	80	6192	432	1065	300	1	16	600	19,49	31,00	170,00	260,00	strangely low, probably cpu bound
RTX 3090	24	10496	328	1395	350	2	6	930	35,58	71,00	75,00	90,00	low performance node, need retest
Tesla V100 PCIe	16	5120	640	937	250	6	12	1420	16,32	25,00	34,00	40,00	uses only 20% TDP ??

EDIT May 9: updated infos and zipped file, will update more as soon as i have more info.
EDIT May20: more info, removed installation instructions (there's a quickinstall.sh bash on my fork)
Hope this helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

going forward + question #28

going forward + question #28

tig3rmast3r commented Mar 26, 2024

hugofloresgarcia commented Apr 4, 2024

tig3rmast3r commented Apr 5, 2024 •

edited

Loading

hugofloresgarcia commented Apr 9, 2024

tig3rmast3r commented Apr 20, 2024

tig3rmast3r commented Apr 21, 2024

tig3rmast3r commented Apr 21, 2024 •

edited

Loading

tig3rmast3r commented Apr 25, 2024 •

edited

Loading

going forward + question #28

going forward + question #28

Comments

tig3rmast3r commented Mar 26, 2024

hugofloresgarcia commented Apr 4, 2024

tig3rmast3r commented Apr 5, 2024 • edited Loading

hugofloresgarcia commented Apr 9, 2024

tig3rmast3r commented Apr 20, 2024

tig3rmast3r commented Apr 21, 2024

tig3rmast3r commented Apr 21, 2024 • edited Loading

tig3rmast3r commented Apr 25, 2024 • edited Loading

tig3rmast3r commented Apr 5, 2024 •

edited

Loading

tig3rmast3r commented Apr 21, 2024 •

edited

Loading

tig3rmast3r commented Apr 25, 2024 •

edited

Loading