Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human 1.17 #700

Merged
merged 211 commits into from
Sep 30, 2023
Merged

Human 1.17 #700

merged 211 commits into from
Sep 30, 2023

Conversation

haowang-bioinfo
Copy link
Member

@haowang-bioinfo haowang-bioinfo commented Sep 25, 2023

Main improvements in this PR:

JonathanRob and others added 30 commits September 15, 2022 14:13
… but remove the FAD-dependent version of the SDH reaction
- This commit includes fixing typos, and reverting back to original way of extracting subsystem values
Copy link
Collaborator

@JonathanRob JonathanRob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, a lot of changes in this update - great to see the ongoing progress and improvements!

Copy link
Member

@mihai-sysbio mihai-sysbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!
@haowang-bioinfo will we see the essentiality results, like last time?

Copy link
Collaborator

@feiranl feiranl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@haowang-bioinfo
Copy link
Member Author

@haowang-bioinfo will we see the essentiality results, like #645 (comment)?

yes, indeed!

@haowang-bioinfo
Copy link
Member Author

updated Hart2015 essentiality results:

'cellLine' 'TP' 'TN' 'FP' 'FN' 'accuracy' 'sensitivity' 'specificity' 'F1' 'MCC' 'Penr' 'logPenr' 'PenrAdj' 'logPenrAdj'
'DLD1' 105 2077 166 211 0.852676826885502 0.332278481012658 0.925991975033437 0.357751277683135 0.276134233976690 4.77088648129624e-33 32.3214009169421 1.43126594438887e-32 31.8442796622224
'GBM' 105 2061 166 226 0.846755277560594 0.317220543806647 0.925460260440054 0.348837209302326 0.264662129889304 5.62054079140837e-31 30.2502218959211 1.12410815828167e-30 29.9491919002571
'HCT116' 135 2117 143 220 0.861185468451243 0.380281690140845 0.936725663716814 0.426540284360190 0.352278370608797 1.14499825836663e-52 51.9411951739200 6.86998955019978e-52 51.1630439235364
'HELA' 89 2143 189 195 0.853211009174312 0.313380281690141 0.918953687821612 0.316725978647687 0.234526274235307 5.49947761890298e-25 24.2596785610516 8.24921642835447e-25 24.0835873019959
'RPE1' 70 2084 201 203 0.842064112587959 0.256410256410256 0.912035010940919 0.257352941176471 0.168991753181488 3.40109542793128e-14 13.4683811824623 4.08131451351753e-14 13.3891999364147
'all' 41 2263 237 75 0.880733944954129 0.353448275862069 0.905200000000000 0.208121827411168 0.172768250444715 2.39047070704732e-13 12.6215165738028 2.39047070704732e-13 12.6215165738028

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Sep 28, 2023

Updated essentiality evaluation using combined (all) Hart2015 datasets:

version TP TN FP FN accuracy sensitivity specificity F1 MCC
v1.12 40 2333 175 77 0.904000000000000 0.341880341880342 0.930223285486443 0.240963855421687 0.204768560393159
v1.13 40 2334 174 77 0.904380952380952 0.341880341880342 0.930622009569378 0.241691842900302 0.205504558241293
v1.14 40 2334 175 77 0.904036557501904 0.341880341880342 0.930251096054205 0.240963855421687 0.204787829413435
v1.15 40 2342 168 77 0.906737723639132 0.341880341880342 0.933067729083665 0.246153846153846 0.210053956889428
v1.16 41 2308 202 76 0.89417587 0.35042735 0.91952191 0.22777778 0.19220101
v1.17 41 2263 237 75 0.880733944954129 0.353448275862069 0.905200000000000 0.208121827411168 0.172768250444715

The trend of MCC is not very promising

@mihai-sysbio
Copy link
Member

The trend of MCC is not very promising

Indeed. How about we pause this release and investigate deeper?

@JonathanRob
Copy link
Collaborator

It looks like the number of false positives is increasing quite a bit. This could be the result of e.g., removal of duplicate pathways, removal of incorrect isozymes, or closing of some infeasible loops - all of which are generally good things.

So I agree with @mihai-sysbio that it would be worth investigating, but I also wouldn't be too alarmed.

@haowang-bioinfo
Copy link
Member Author

it's a good idea to investigate further for the MCC declining, any attempt is welcome.

while the pause of this release doesn't make sense, because:

  1. Hart2015 is a limited dataset and essentiality is only one dimension of assessment;
  2. Every single change has been evidence-based, openly discussed, and transparently tracked; this means we should move on
  3. we may be choking on food, but still have to eat

@mihai-sysbio
Copy link
Member

For the sake of the debate, I'm going to argue for the pause. Don't get me wrong, I am not entirely convinced pausing is the best way forward. I am just willing to engage in the debate.

A pause is not a cancellation, it's a delay, and the model can still be used. The model is openly available, even though not in the main branch, and it an still be uniquely identified with the commit hash, as visible in the permalink. However, it's only the yml format.

  1. Sure, Hart2015 is a limited dataset, but maybe it's time to make the time investment to set the essentiality to run with Depmap. In my mind this is an unavoidable progressive step, and the latest essentiality results are enough of a motivation.

  2. If I am to extrapolate, for me it would not be acceptable to have an evidence-based model with MCC of -0.3. That's obviously an extreme (implausible?) scenario. What I'm trying to say is that the condition of evidence-based is necessary but not sufficient.

  3. Sorry, I'm not sure what this means.

Moreover, now the information is still relatively fresh, so digging into this now might be more comfortable than in say in 6 months. And what we will learn from this process will also be benefiting the next release.

@mihai-sysbio
Copy link
Member

Some more thoughts:

The way I've mapped the MCC score in my head is like this: the interval [-1, 0] is "worse than useless," and the interval [0, 1] corresponds to 0% to 100% usefulness. With this perspective, a reduction of 5% in usefulness from 21% usefulness is not good. Of course, this thinking really abuses the numbers.

Another thing: if a new release would be affecting the MCC by say 0.2 (instead of 0.05 like now) we would likely take it very seriously. So then the question is not if but how much of a MCC change is acceptable with a release.

@haowang-bioinfo
Copy link
Member Author

Debate is good, it often makes things clearer. Probably my previous message is a bit ambiguous, so try again.

My key point is investigating MCC and making this release can be separated, and proceed in parallel. They are not conflicting with each other, no need to pause one for another.

  1. Sure, Hart2015 is a limited dataset, but maybe it's time to make the time investment to set the essentiality to run with Depmap. In my mind this is an unavoidable progressive step, and the latest essentiality results are enough of a motivation.

yes of course, I totally agree to check this out. Actually this and all previous releases, as well as all involving transparent changes, enable this kind of check.

  1. If I am to extrapolate, for me it would not be acceptable to have an evidence-based model with MCC of -0.3. That's obviously an extreme (implausible?) scenario. What I'm trying to say is that the condition of evidence-based is necessary but not sufficient.

Agree, literature evidence, MCC, and some other indicators may still not be sufficient. But this shouldn't stop rational changes with critical review.

Moreover, now the information is still relatively fresh, so digging into this now might be more comfortable than in say in 6 months. And what we will learn from this process will also be benefiting the next release.

yes, just go ahead please

Another thing: if a new release would be affecting the MCC by say 0.2 (instead of 0.05 like now) we would likely take it very seriously. So then the question is not if but how much of a MCC change is acceptable with a release.

I'm not exactly sure what cutoff should be used for evaluating MCC, which is only one dimension of assessment. So any follow-up analysis in clarifying this is welcome

@haowang-bioinfo haowang-bioinfo merged commit 8643327 into main Sep 30, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants