-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction result only ranging from [0.4,0.5] #12
Comments
Hello, I have met the same problem here. How do you solve it? |
We will see if the leading author give a different comment. but let me give my perspective here. when you are using dna sequence alone, this information alone is not supposed to tell if a TF binds or not. therefore, a good model should not give extreme large or small values as there is not sufficient confidence. i am surprised auprc is high, i don't think so-- as the baseline is so low due to extremely limited number of positive example. i think only auroc would be high in this case |
Sorry, I realize it's due to the bigwig I used to train: I just use the signal bigwig directly but not peak. It will be helpful to add the content on how to generate bigwig for training to the readme. Thank you :> |
Hi, the model based on DNA only (without DNase-seq) will not be very informative - for the same TF, it can not distinguish different binding profiles in different cell types. As Yuanfang mentioned, the model will be more "conservative" in predictions and the values could be around 0.5. The key information of DNase-seq is missing to generate high-confident predictions - that's also why e.g. traditional motif-based models have many false positive peaks. |
To generate peak bigwig files, usually you need two steps: (1) call peaks using whatever software and (2) convert peak files into bigwig format. Once you have the peak values, I convert it into bigwig using some in-house codes. You can check out lines 54-72 in this code for your reference. Thank you! |
Hi! I have tried to train the Leopard model only on the DNA seq of reference genome and remove the DNase-seq / delta DNase out from the input feature. However, the prediction result only gives the value ranging from [0.4,0.5] and cannot capture any peak while having a high AUPRC score. Has anyone ever experienced this issue?
The text was updated successfully, but these errors were encountered: