Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support a variety of non-gene sequence features using 'type' and a generic parent class #1210

Closed
sierra-moxon opened this issue Jan 12, 2023 · 9 comments

Comments

@sierra-moxon
Copy link
Member

from @hitz

What are your thoughts on Objects like:
A “putative cis-regulatory element” defined by a consensus chromatin accessibility peak with or without supporting histone mark data"
Or
“An segment of human genome DNA tested in an MPRA experiment to determine it’s effect on gene expression”
Or
“A region targeted by guide RNAs using a CRISPr activation screen”
I suspect we would have to make some extension for biolink:GenomicEntity to cover non-Gene features.

And in addition, if we want to use biolink:GenomicEntity directly, we need to figure out how to move it out of the mixin hierarchy.

@hitz
Copy link

hitz commented Jan 18, 2023

@sierra-moxon What about: http://www.sequenceontology.org/browser/current_release/term/SO:0005836
"regulatory_region". Is there a way (or even a need) to qualify this with "proposed' or "putative"? I supposed you can just assign evidentiary properties "(determined by chromatin accessibility / ATAC-seq / in cell like GM12878)

This would be a Class that uses biolink:GenomicEntity as a mixin.

@hitz
Copy link

hitz commented Jan 18, 2023

There is also: http://www.sequenceontology.org/browser/current_release/term/SO:0002331 (accessible_dna_region)
and http://www.sequenceontology.org/browser/current_release/term/SO:0000235 (tf_binding_site)

tf_binding_site ISA http://www.sequenceontology.org/browser/current_release/term/SO:0000235 which ISA regulatory_region (SO:0005836)

accessible_dna_region ISA epigenically_modified_region (this seems wrong btw) which ISA regulatory_region (SO:0005836)

Do you think when modeling a KG like this it's better to use the more general parent (so "all" items are findable without closure) or use the most specific version and rely on the ontology graph to connect.

@hitz
Copy link

hitz commented Jan 18, 2023

if we just use BioCypher to implicitly subclass BiologicalEntity? Then I wouldn't have to actually submit a PR for this ticket...

@hitz
Copy link

hitz commented Feb 7, 2023

@sierra-moxon does my comment make any sense?

@sierra-moxon
Copy link
Member Author

It does, but if you extend in BioCypher, it's a precursor to a PR in Biolink right?

What if you just used the biolink class, 'biolink:NucleicAcidEntity' and add the node property, 'biolink:type' to hold a more specific SOTerm of your choosing for each more specific sequence feature that you need (this was @cmungall's original idea)?

I took another look at 'biolink:GenomicEntity' and I hesitate to move it to a class vs. a mixin because it is the way we currently bridge the biology/chemistry perceptions of gene as a biological entity vs. a chemical entity (it's both). But I'm willing to explore options here if 'biolink:NucleicAcidEntity' does not make sense.

@sierra-moxon What about: http://www.sequenceontology.org/browser/current_release/term/SO:0005836 "regulatory_region". Is there a way (or even a need) to qualify this with "proposed' or "putative"? I supposed you can just assign evidentiary properties "(determined by chromatin accessibility / ATAC-seq / in cell like GM12878)

right - I think you could handle the predictive nature of this with evidence and provenance properties.

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Feb 7, 2023

your subject and object nodes might look something like this:

category: biolink:NucleicAcidEntity
type: SO:0005836
id: mydb:12345

category: biolink:NucleicAcidEntity
type: SO:soterm_for_chromosome
id: NC_007112.7

your edge might look like this (these would all be edge properties between the chromosome, or whatever reference sequence you wanted to locate the NucleicAcidEntity on, and the NucleicAcidEntity itself) :

subject: mydb:12345
predicate: biolink:has_sequence_location
object: NC_007112.7
category: biolink:GenomicSequenceLocation
start_interbase_coordinate: 123
end_interbase_coordinate: 456
genome_build: xyz

I think we could add an edge property to represent the predictive nature of some of these locations - we've been discussing adding "prediction" or "statistical correlation" or "hypothesis" keywords as evidence types to further qualify an edge.

@hitz
Copy link

hitz commented Feb 8, 2023

` regulatory region:
description: >-
A region (or regions) of the genome that contains known or putative regulatory elements
that act in cis- or trans- to affect the transcription of gene
is_a: biological entity
mixins:
- genomic entity
- chemical entity or gene or gene product
- physical essence
- ontology class
aliases: ['regulatory element']
slots:
- xref
exact_mappings:
- SO:0005836
- SIO:001225
- WIKIDATA:Q3238407

accessible dna region:
description: >-
A region (or regions) of a chromatinized genome that has been measured to be more
accessible to an enyzme such as DNase-I or Tn5 Transposase
is_a: regulatory region
mixins:
- genomic entity
- chemical entity or gene or gene product
- physical essence
- ontology class
aliases: ['dnase-seq acessible region', 'atac-seq accessible region']
slots:
- xref
exact_mappings:
- SO:0002231

transcription factor binding site:
description: >-
A region (or regions) of the genome that contains a region of DNA known or predicted
to bind a protein that modulalates gene transcription
is_a: regulatory region
mixins:
- genomic entity
- chemical entity or gene or gene product
- physical essence
- ontology class
aliases: ['tf binding site', 'binding site']
slots:
- xref
exact_mappings:
- SO:0000235
`

Not sure if I should make a PR to biocypher instead?

@sierra-moxon
Copy link
Member Author

@hitz - would you consider these new classes to be children of NucleicAcidEntity as well?

@hitz
Copy link

hitz commented Feb 8, 2023

@sierra-moxon I think I would but Gene class isn't? That seems like a mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants