Analysis: get a list of candidate GO terms #28

leandroradusky · 2023-05-12T12:24:33Z

Now we have a method to compute candidate GO terms, we should investigate over which pairs of proteins-terms we should make predictions (the limit of the competition is 15k predictions, while the number of proteins in the test set is >140k and the number of GO terms are also tens of thousands).

For a first analysis, let's start with the direct child terms of those already assigned over the test set of proteins. Each term, based on its rarity over the whole protein universe, has a score (Information Accretion, here a full explanation of this term). Let's call this IA(term).

We should create an analysis where we compute:

All the direct child GO terms over the test set of proteins, saving for each candidate term the number of proteins this term is a candidate for (let's call this #proteins(term)).
We will go naive: we will rank the terms to be predicted by multiplying #proteins(term) * IA(term) for each term.
We should compute the pais of GO terms - proteins to be predicted, with a cutoff on the 15k predictions.

Usually, jupyter notebooks are used to make analyses more than scripts, since you can describe the step-to-step with markdown, plot things, etc. which will be useful to communicate our decisions toward the final predictions. Notebooks are well displayed in GitHub, they format the markdown, display the plots, etc. Let's include the generated notebook in a folder called analyses and "consume" the functionalities of the package already developed as a first example of its use also.

The text was updated successfully, but these errors were encountered:

* ia() and get_parents() for #28 * filtering to ensure children and parents are candidate terms, not actual terms * ancestors_within_distance for max_distance param to get_parents()

nthiad · 2023-06-01T16:41:41Z

partly added in #33 but jupyter notebook needs to be written

nthiad self-assigned this May 23, 2023

nthiad added a commit that referenced this issue May 30, 2023

ia() and get_parents() for #28

94d09fa

nthiad mentioned this issue May 30, 2023

ia() and get_parents() #33

Merged

nthiad added a commit that referenced this issue Jun 1, 2023

ia() and get_parents() (#33)

354a64e

* ia() and get_parents() for #28 * filtering to ensure children and parents are candidate terms, not actual terms * ancestors_within_distance for max_distance param to get_parents()

nthiad closed this as completed Jun 1, 2023

nthiad reopened this Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis: get a list of candidate GO terms #28

Analysis: get a list of candidate GO terms #28

leandroradusky commented May 12, 2023 •

edited

Loading

nthiad commented Jun 1, 2023 •

edited

Loading

Analysis: get a list of candidate GO terms #28

Analysis: get a list of candidate GO terms #28

Comments

leandroradusky commented May 12, 2023 • edited Loading

nthiad commented Jun 1, 2023 • edited Loading

leandroradusky commented May 12, 2023 •

edited

Loading

nthiad commented Jun 1, 2023 •

edited

Loading