-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training process #28
Comments
Has anyone looked into this yet? I am also interested in this, since training enformer from scratch using your implementation doesn't reproduce same Pearson correlation values (max I am getting is ~0.4). |
@fransilvionGenomica @maksimallist i tried a while ago using TPUs (didn't have access to large cluster of GPU at the time) and didn't hit the mark (got around 0.5-0.6). this was before Ziga officially released their model over at deepmind the training script i used is all open sourced here . the original reason for making the repo was for a contracting project for a local startup |
@fransilvionGenomica are you planning on training it on proprietary data with your own GPU cluster? |
@lucidrains I am training your pytorch implementation using a single A100 GPU node with the original basenji dataset and gradient accumulation. I was using the following deepmind notebook as the reference: https://github.com/google-deepmind/deepmind-research/blob/master/enformer/enformer-training.ipynb. I do believe that it is possible to train the model on GPUs, since in the recent Borzoi paper from Enformer co-authors they did not use TPUs (https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1). Unfortunately, they don't provide any training script (https://github.com/calico/borzoi). |
@fransilvionGenomica ahh, i have not checked out Borzoi yet, although someone else told me it is the successor to Enformer why are you still using this repository if Borzoi is the new SOTA? without reading the paper, did Borzoi set a new SOTA? |
@fransilvionGenomica where do you work btw? |
Oh I see. It makes sense. Even Borzoi mentioned it took them ~25 days on 2 GPUs. And I am training on a single GPU. I guess, I will just have to wait then. Thanks! |
@fransilvionGenomica that is strange they waited that long. i thought calico had google level resources |
@fransilvionGenomica i'll revisit genomics maybe end of the month and read the Borzoi paper in detail. knee deep in other projects at the moment. |
ahh ok, was told that Borzoi is nothing more than Enformer applied to RNA-seq data. ok then using this repository is fine in that case |
Yes, architecture wise they are very similar. Borzoi is actually less complex. |
@fransilvionGenomica ok, i'll just copy / paste the existing code and remove that complexity for Borzoi later this month after i read the paper. hopefully they got rid of the annoying gamma positions |
Just curious, have you noticed anything about the batch size while training enformer from scratch? Like, does it have to be relatively big (like at least 32) or can you train decently even if batch size is 1 or 2? |
@fransilvionGenomica it has to be big (32 or 64). managing the data and long sequences was also a huge pain |
@fransilvionGenomica the code in this repository isn't even setup for distributed training. i didn't set up synchronized batchnorm, which is required for it to train well. |
@fransilvionGenomica actually let me just throw that in there for now |
Have you tried to run your enformer implementation with pytorch lightning? |
@fransilvionGenomica no i haven't, as i said above, my training was done in tensorflow sonnet with TPUs, as i had access to a large cluster of TPUs in collaboration with EleutherAI back then |
@fransilvionGenomica if you ever wire up a working training script, always welcome a pull request, in the spirit of open source science. |
What the paper says: Let's wait until reviewers ask for this question =) |
@lucidrains do you have training/validation loss trends left by any chance? for your tensorflow training code I mean. |
@fransilvionGenomica hey yes, actually still have it lying around (thanks wandb) https://api.wandb.ai/links/lucidrains/9ac4x106 |
hello,may i ask how to fix this? My training time is several times higher when I train with DDP than single GPU (with the same |
Hello. Can you share the details of neural model training? Did you train it yourself? Did you collect data for training from basenji dataset files? I am unable to reproduce the claimed results during training.
The text was updated successfully, but these errors were encountered: