2024 branch info #32

tig3rmast3r · 2024-06-05T08:32:49Z

Hallo,
really happy to see this project going on, i'm having some problems and i have some questions about this new branch:
still have to test the training thing...
1 - beat mask has no effect now, no matter how i set it the beat mask is not functional
2 - onset_mask_width is gone for good or it will be back?
3 - no c2f anymore? all the tokens goes to the main PTH now ?
4 - i tried to load a model made with previous version and results are not that good, do i need to train again ?
5 - would you provide some info about the new pre-trained model(s) ? (particularly training settings like noam factor/warmup, batch, n. of chunks and total n.of iters), is the same dataset from last year's one ?
6 - would be great if you can recommend a ready-to-go python/pytorch combination so we can start a container with those settings right away, i spent lot of time to find working combinations particularly for multi-gpu training + torch.compile. Also i had issues of bad audio encoding during training as described in the other issue, don't know if it has been addressed.

thanks

hugofloresgarcia · 2024-06-30T16:33:11Z

Hi @tig3rmast3r!

apologies for the slow reply!
the 2024 branch is a work-in-progress dev branch where I'm working on a couple of things, namely:

making it easy to install and run the interface without much of a hassle
getting rid of c2f, so that you just have to train a single model
switching focus to sound in general as opposed to instrumental // vocal pop music.

1 - beat mask has no effect now, no matter how i set it the beat mask is not functional.

that's a bug! just opened #35 to address this.

4 - i tried to load a model made with previous version and results are not that good, do i need to train again ?

if you're trying to use the old model, use the ismir-2023 branch, which should be stable: https://github.com/hugofloresgarcia/vampnet/tree/ismir-2023

5 - would you provide some info about the new pre-trained model(s) ? (particularly training settings like noam factor/warmup, batch, n. of chunks and total n.of iters), is the same dataset from last year's one ?

I will provide these details in a config file once I've settled on one! at the moment, I'm experimenting with different configs, though so far they haven't differed too much from the original (except the number of iters, which is shorter (250k-500k instead of 1M)

6 - would be great if you can recommend a ready-to-go python/pytorch combination so we can start a container with those settings right away, i spent lot of time to find working combinations particularly for multi-gpu training + torch.compile. Also i had issues of bad audio encoding during training as described in the other issue, don't know if it has been addressed.

I will open an issue and look into this as well (#36) ! I am using Python 3.9 + torch 2.1.2 at the moment. If you'd like to take a stab at containerizing the repo and making a Dockerfile, I'd happily accept a PR!

cheers :)

tig3rmast3r · 2024-07-17T21:38:12Z

for training i'm using python 3.10 + torch 2.3.1 and it works fine in most cases, i only have issues sometimes when trying to use multi-gpu, in that case i've found the best combo using 3.10 + torch 2.0.1.
the best i did is an sh file that quickly populate an empty ubuntu 22.04 docker and do all the stuff to make it work correctly (the old ismir branch), it's available on my fork.
i didn't tested 3.9 + 2.1.2 combo, i'll give it a try, 3.9 + pytorch 2.2.x give errors with multi gpu. the only working combos i've found so far with parallel gpu was 3.10 with 2.0.1 and 2.3.x(sometimes).
are you using 11.8 cuda ?
thanks

i've even managed to upgrade flash_attn to v2 but i'm getting gradient explosions issues with that, not really usable, have you made some tries with v1 ? any clue ?
even with a very low lr like 0.0002 after a while it goes crazy with grad norm..
if i try lower lr it stucks around 6.7 loss, just a waste of time...

FYI if you are trying different values for training i made lot of tests and i've found that increasing layers while keepings heads lower gives better results, some good tests:
dim 1536 layers 32 heads 24
dim 1920 layers 24 heads 20
dim 1440 layers 30 heads 20
all the above are flash_attn v2 friendly
lastly, removing dropout also helps, expecially near the end of training when lr is low like 0.00005, to remove dropout completely i had to edit transformer.py and change all the float defs to 0 because setting it at 0 in yml wasn't working.
all the rest as default from vampnet.yml

hope this helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024 branch info #32

2024 branch info #32

tig3rmast3r commented Jun 5, 2024

hugofloresgarcia commented Jun 30, 2024 •

edited

Loading

tig3rmast3r commented Jul 17, 2024 •

edited

Loading

2024 branch info #32

2024 branch info #32

Comments

tig3rmast3r commented Jun 5, 2024

hugofloresgarcia commented Jun 30, 2024 • edited Loading

tig3rmast3r commented Jul 17, 2024 • edited Loading

hugofloresgarcia commented Jun 30, 2024 •

edited

Loading

tig3rmast3r commented Jul 17, 2024 •

edited

Loading