You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use-case in the later comments of #20: can we have CDSs be children of exons?
More generally, can we define additional relationships that are not defined in the original GFF/GTF?
There are two different ways of "adding a relation", depending on what you want to do with it later:
Add a relation in the attributes, without touching the database relations, by adding an entry like Parent="gene1" in the attributes for a feature. This will only affect the output of GFF lines, and will not allow child/parent queries via the gffutils API. So it's more like a cosmetic change.
Add a relation to the database itself, by adding an entry in the relations table. On its own, this does not affect the output of GFF lines, since the info is not stored in the attributes field of a feature. But it does allow children/parents to be accessed from the db via the gffutils API.
For 1), this should just be left to the user to manipulate the attribute dictionaries however they want to.
For 2), this should be implemented as a new method on a FeatureDB -- say, FeatureDB.add_relation(). It should also [optionally] make the change in the attributes as in 1).
However . . .
There's the issue of database IDs, which are primary keys, vs the IDs that you'd want to include in the final output. The database ID needs to be added to the relations table, but you might want some other ID to be added to the attributes.
A concrete example, showing just attributes to save space:
Imagine we imported these features into a db using the idspec fancy_id, such that exon:chr1:1-100 and CDS:chr1:1-100 are the primary keys in the db. We want to make CDSs be children of exons. So we add an entry in the relations table, using the hypothetical FeatureDB.add_relation() method:
exon:chr1-100 CDS:chr1:1-100 1
But in the output -- and therefore the attributes as stored in the db -- we want to specify the parent in terms of another attribute, like ID. So when we print out the GFF lines, they look like this:
where parent_handler is a custom function with signature parent_func(parent, child) and returns the modified parent, and child_func(parent, child) returns the modified child.
In this example, parent_func would be lambda x: x and child_func would be:
In ENSEMBL, exons have different ENSE ids if they are of different frames. Ideally, the same exon locus exon:chr1:100-200:+:. would be a parent to both CDS:chr1:100-200:+:2 and CDS:chr1:100-150:+:0. So somehow the parent exon locus would have to match up with the different ENSE ids of the CDSs
Assuming the existence of FeatureDB.add_relation, would this work? :
db=FeatureDB('filename.db')
exon=db["exon:chr1:100-200:+:."]
# Get CDSs that fall within this exonforcdsindb.region(exon, featuretype='CDS', completely_within=True):
# Maybe some filtering here to confirm that these are the # CDSs you're looking for...defchild_func(parent, child):
child.attributes['parent_exon'] =parent['ENSE']
returnchilddb.add_relation(
parent=exon, child=cds, level=1,
parent_func=lambdax: x, child_func=child_func)
Use-case in the later comments of #20: can we have
CDS
s be children ofexon
s?More generally, can we define additional relationships that are not defined in the original GFF/GTF?
There are two different ways of "adding a relation", depending on what you want to do with it later:
Parent="gene1"
in the attributes for a feature. This will only affect the output of GFF lines, and will not allow child/parent queries via thegffutils
API. So it's more like a cosmetic change.gffutils
API.For 1), this should just be left to the user to manipulate the attribute dictionaries however they want to.
For 2), this should be implemented as a new method on a
FeatureDB
-- say,FeatureDB.add_relation()
. It should also [optionally] make the change in the attributes as in 1).However . . .
There's the issue of database IDs, which are primary keys, vs the IDs that you'd want to include in the final output. The database ID needs to be added to the
relations
table, but you might want some other ID to be added to the attributes.A concrete example, showing just attributes to save space:
Imagine we imported these features into a db using the idspec
fancy_id
, such thatexon:chr1:1-100
andCDS:chr1:1-100
are the primary keys in the db. We want to makeCDS
s be children ofexon
s. So we add an entry in therelations
table, using the hypotheticalFeatureDB.add_relation()
method:But in the output -- and therefore the attributes as stored in the db -- we want to specify the parent in terms of another attribute, like
ID
. So when we print out the GFF lines, they look like this:This implies a method signature like:
where
parent_handler
is a custom function with signatureparent_func(parent, child)
and returns the modified parent, andchild_func(parent, child)
returns the modified child.In this example,
parent_func
would belambda x: x
andchild_func
would be:The text was updated successfully, but these errors were encountered: