Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024 branch info #32

Open
tig3rmast3r opened this issue Jun 5, 2024 · 2 comments
Open

2024 branch info #32

tig3rmast3r opened this issue Jun 5, 2024 · 2 comments

Comments

@tig3rmast3r
Copy link

Hallo,
really happy to see this project going on, i'm having some problems and i have some questions about this new branch:
still have to test the training thing...
1 - beat mask has no effect now, no matter how i set it the beat mask is not functional
2 - onset_mask_width is gone for good or it will be back?
3 - no c2f anymore? all the tokens goes to the main PTH now ?
4 - i tried to load a model made with previous version and results are not that good, do i need to train again ?
5 - would you provide some info about the new pre-trained model(s) ? (particularly training settings like noam factor/warmup, batch, n. of chunks and total n.of iters), is the same dataset from last year's one ?
6 - would be great if you can recommend a ready-to-go python/pytorch combination so we can start a container with those settings right away, i spent lot of time to find working combinations particularly for multi-gpu training + torch.compile. Also i had issues of bad audio encoding during training as described in the other issue, don't know if it has been addressed.

thanks

@hugofloresgarcia
Copy link
Owner

hugofloresgarcia commented Jun 30, 2024

Hi @tig3rmast3r!

apologies for the slow reply!
the 2024 branch is a work-in-progress dev branch where I'm working on a couple of things, namely:

  • making it easy to install and run the interface without much of a hassle
  • getting rid of c2f, so that you just have to train a single model
  • switching focus to sound in general as opposed to instrumental // vocal pop music.

1 - beat mask has no effect now, no matter how i set it the beat mask is not functional.

that's a bug! just opened #35 to address this.

4 - i tried to load a model made with previous version and results are not that good, do i need to train again ?

if you're trying to use the old model, use the ismir-2023 branch, which should be stable: https://github.com/hugofloresgarcia/vampnet/tree/ismir-2023

5 - would you provide some info about the new pre-trained model(s) ? (particularly training settings like noam factor/warmup, batch, n. of chunks and total n.of iters), is the same dataset from last year's one ?

I will provide these details in a config file once I've settled on one! at the moment, I'm experimenting with different configs, though so far they haven't differed too much from the original (except the number of iters, which is shorter (250k-500k instead of 1M)

6 - would be great if you can recommend a ready-to-go python/pytorch combination so we can start a container with those settings right away, i spent lot of time to find working combinations particularly for multi-gpu training + torch.compile. Also i had issues of bad audio encoding during training as described in the other issue, don't know if it has been addressed.

I will open an issue and look into this as well (#36) ! I am using Python 3.9 + torch 2.1.2 at the moment. If you'd like to take a stab at containerizing the repo and making a Dockerfile, I'd happily accept a PR!

cheers :)

@tig3rmast3r
Copy link
Author

tig3rmast3r commented Jul 17, 2024

for training i'm using python 3.10 + torch 2.3.1 and it works fine in most cases, i only have issues sometimes when trying to use multi-gpu, in that case i've found the best combo using 3.10 + torch 2.0.1.
the best i did is an sh file that quickly populate an empty ubuntu 22.04 docker and do all the stuff to make it work correctly (the old ismir branch), it's available on my fork.
i didn't tested 3.9 + 2.1.2 combo, i'll give it a try, 3.9 + pytorch 2.2.x give errors with multi gpu. the only working combos i've found so far with parallel gpu was 3.10 with 2.0.1 and 2.3.x(sometimes).
are you using 11.8 cuda ?
thanks

i've even managed to upgrade flash_attn to v2 but i'm getting gradient explosions issues with that, not really usable, have you made some tries with v1 ? any clue ?
even with a very low lr like 0.0002 after a while it goes crazy with grad norm..
if i try lower lr it stucks around 6.7 loss, just a waste of time...

FYI if you are trying different values for training i made lot of tests and i've found that increasing layers while keepings heads lower gives better results, some good tests:
dim 1536 layers 32 heads 24
dim 1920 layers 24 heads 20
dim 1440 layers 30 heads 20
all the above are flash_attn v2 friendly
lastly, removing dropout also helps, expecially near the end of training when lr is low like 0.00005, to remove dropout completely i had to edit transformer.py and change all the float defs to 0 because setting it at 0 in yml wasn't working.
all the rest as default from vampnet.yml

hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants