-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
negative values in grampa.csv file #8
Comments
@UnixJunkie I have the same question. Could you understand why? @zswitten and @jswitten Thanks for your great contribution. Can you explain the MIC values? If I want to change it to a binary classification problem (AMP and non-AMP) how to decide on threshold value? |
I don't know exactly. |
It's log (MIC in uM) so any MIC < 1uM will have a negative value Regarding the threshold value, it's totally arbitrary, there's no one absolutely correct way except I guess everything in the database is a positive in some sense. But because thresholding is inherently arbitrary, we used regression in our paper |
We did convert to a classification problem in order to benchmark our results and we used both totally random peptides and random peptides from Uniprot I believe, I kind of forget, you can read our paper |
From the AMP literature, I would say that having a MIC value <= 32 ug/mL might be a reasonable threshold. |
@jswitten Thanks for your reply, I just draw a histogram of MIC values:
In your paper in Section entitled (Ensemble model), you have mentioned:
Is log MIC<= 3.5uM your threshold boundary for the active peptides? In that case, the number of non-AMP (inactive peptide) samples for training becomes a very small number (imbalance) compared to active AMPs. Sorry again for the long question. But I could not find a clear answer in other literature, and your dataset is the only one that I can use for use-case. |
@UnixJunkie Thanks for your reply,
Can you please mention the title of one of the papers that have mentioned In some literature the |
What I did was declare all peptides in the dataset to be positives and generate negatives either by generating completely random peptides or by taking random peptides form UniProt, see Table 2, Table S3, and related discussion. So the negatives were synthetically generated and every peptide in the dataset is positive because every peptide in GRAMPA has been reported antimicrobial to something. Or in other words threshold I used was "in GRAMPA vs not in GRAMPA" |
@jswitten Thank you for the clarification |
Some authors from a US lab generate negatives by randomizing the order of amino acids from the sequences of known actives. There is a rational for this procedure: it destroys the hydrophobic moment of known actives, which means such peptides cannot anymore perturbate the membrane of microbes (which is the assumed mode of action for many antimicrobial peptides). |
Some concentration values are negative.
I don't think this is possible, so there is a problem somewhere that introduced those negative values.
The text was updated successfully, but these errors were encountered: