Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The full training data is missing #2

Open
karamveerverma37 opened this issue Jun 18, 2024 · 11 comments
Open

The full training data is missing #2

karamveerverma37 opened this issue Jun 18, 2024 · 11 comments

Comments

@karamveerverma37
Copy link

Hi, I would like to train a model for genome wide predictions and I found that the example dataset given is having a subset of training dataset (Chr19). Can you please share the full training dataset used for training.

@yupenghe
Copy link
Owner

Sure. See table S3 and S4 in the REPTILE paper
https://www.pnas.org/doi/full/10.1073/pnas.1618353114

@karamveerverma37
Copy link
Author

Thanks for sharing the manuscript. As I can see this is raw data, can you share the preprocessed data used for training such as regions and their labels/state.

@yupenghe
Copy link
Owner

Will try but no promise since the data has been quite a few years. Also the training data will be on mm10 which probably won't help your case. Downloading raw data and reprocessing them on mm39 genome would be the best way.

@karamveerverma37
Copy link
Author

Hi Thanks for the suggestion. My aim is to use trained models or train model in REPTILE to infer enhancers. I am not much aware of all types of data preprocessing. I have scripts to liftover from mm10 to mm39. Please share the data if available, I can use the preprocessed data used in the manuscript.

@yupenghe
Copy link
Owner

Ok. I think I got the training data. Will organize it a little bit before sharing. Do you read perl script (which is what I used to run training commands)?

@karamveerverma37
Copy link
Author

Hi, thanks for the update. Yes I understand perl.

@karamveerverma37
Copy link
Author

karamveerverma37 commented Jun 26, 2024

Also I would like to know the pretrained models provided in models directory. Are these models trained on the full genome data or only chr19. because when I try to use them to compute score I get the values only for chr19 and 0 for others chromosomes. If these are trained only on chr19. Can you share the pretrained models for full genome if available.
Thank you.

@yupenghe
Copy link
Owner

yupenghe commented Jun 26, 2024

I try to use them to compute score I get the values only for chr19 and 0 for others chromosomes.

This is likely due to input files. Do you mind checking that the data of all chromosomes were used as input?

@yupenghe
Copy link
Owner

yupenghe commented Jun 26, 2024

added the training and test data

https://github.com/yupenghe/REPTILE/tree/master/all_data

@karamveerverma37
Copy link
Author

Thanks for sharing the training data. Does It also need bigwig files of the full genome, or it can take the data from epimark file.

@yupenghe
Copy link
Owner

It starts from epimark files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants