Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about hetionet: metabolomics / side effects versus diseases #15

Open
gcsh86 opened this issue Mar 12, 2019 · 1 comment
Open

Comments

@gcsh86
Copy link

gcsh86 commented Mar 12, 2019

Hi Daneil,

For Hetionet, I have two brief questions and would like to hear about your insights:

  1. on metabolomics side, why didn't you use the HMDB database for linking metabolites, diseases, variants, genes etc?

  2. For sepsis and chronic fatigue, why they are categorized as side effects rather than diseases?

@dhimmel dhimmel changed the title About hetionet Questions about hetionet: metabolomics / side effects versus diseases Mar 12, 2019
@dhimmel
Copy link
Owner

dhimmel commented Mar 12, 2019

Thanks @gcsh86 for the questions.

on metabolomics side, why didn't you use the HMDB database for linking metabolites, diseases, variants, genes etc?

There is no reason other than I wasn't aware of an omics-wide resource for metabolite nodes / edges. I also wasn't sure whether metabolites would be redundant with compounds. Metabolomics is an area that I don't know much about, so there is potentially opportunity I overlooked.

When considering adding an additional data resource, I recommend first drawing out what node/edge types that resource would contribute to the metagraph (Figure 1A of the manuscript). @gcsh86 if you have a specific proposal of what node/edge types could be generated from HMDB, I could provide more tailored feedback.

For sepsis and chronic fatigue, why they are categorized as side effects rather than diseases?

Some entities can conceptually belong to multiple node types. This is especially true for diseases, side effects, and symptoms. For these three node types, you could imagine a single concept being all three types. For example, sepsis or chronic fatigue could potentially be all three. For Hetionet, we created separate nodes for diseases, side effects, and symptoms. Therefore, it is possible for fatigue to be three separate nodes depending on its context. Do the following query at https://neo4j.het.io/browser/ and you will see "fatigue" shows up in the names of both side effects and symptoms:

MATCH (node)
WHERE node.name =~ '(?i).*fatigue.*'
RETURN node

image

You can learn more about how we created our disease catalog in this discussion. Briefly quoting from the manuscript:

We selected 137 terms from the Disease Ontology (which we refer to as DO Slim) as our disease set. Our goal was to identify complex diseases that are distinct and specific enough to be clinically relevant yet general enough to be well annotated. To this end, we included diseases that have been studied by GWAS and cancer types from TopNodes_DOcancerslim. We ensured that no DO Slim disease was a subtype of another DO Slim disease.

So fatigue and sepsis did not make this cut.

More generally, one could allow a single node to have multiple types (or labels in neo4j parlance). For simplicity, we did not allow this when building Hetionet. The number of nodes that this effects is relatively low and the one-type-per-node assumption helped simplify the metagraph and computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants