Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the new positive data #6

Open
visze opened this issue Dec 6, 2021 · 4 comments
Open

Check the new positive data #6

visze opened this issue Dec 6, 2021 · 4 comments
Assignees

Comments

@visze
Copy link
Collaborator

visze commented Dec 6, 2021

Just see if there will be an improvement. This will not be part of the manuscript

@visze visze self-assigned this Dec 6, 2021
@visze
Copy link
Collaborator Author

visze commented Dec 9, 2021

well they are a bit better:

new positives:

metric  value
AUROC   0.995
AUPRC   0.605

curves_prc_roc_global_mean_new_positives

old positives:

metric  value
AUROC   0.996
AUPRC   0.585

curves_prc_roc_global_mean

But I have to rerun everyting 100 times to see the average increase.

@visze
Copy link
Collaborator Author

visze commented Dec 10, 2021

After rerunning parsmurf 100 times with different seeds I get on hg38, with global means:

additional positives:

metric	mean	max	min
AUROC	0.99501	0.996	0.995
AUPRC	0.59688	0.609	0.578

standard positives:

metric	mean	max	min
AUROC	0.9959	0.996	0.995
AUPRC	0.58186	0.598	0.563

So we get a slight increase in AUPRC of 0.01502 and a small decrease of AUROC of -0.00089

I don't think it is worth to include the new data, because of the small increase (and we tuning on imbalance, so better AUPRC is somehow expected). In theory a different test set (not crossvalidation) is needed to really show if this helps.

Using global means for some features works much better than new positives (see #12)

@visze
Copy link
Collaborator Author

visze commented Dec 10, 2021

redo the same for feature set of remm v1.4 and with both genome releases

@visze
Copy link
Collaborator Author

visze commented Jan 3, 2022

Results

ReMM v1.4 hg38

For 100 repetitions with random seeds

positives Metric Mean Max Min
standard AUPRC 0.599 0.615 0.584
AUROC 0.996 0.996 0.995
additional AUPRC 0.609 0.62 0.595
AUROC 0.995 0.996 0.995

ReMM v1.4 hg19

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant