Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrofitting a db to add additional relationships #28

Closed
daler opened this issue Jan 2, 2014 · 4 comments
Closed

retrofitting a db to add additional relationships #28

daler opened this issue Jan 2, 2014 · 4 comments

Comments

@daler
Copy link
Owner

daler commented Jan 2, 2014

Use-case in the later comments of #20: can we have CDSs be children of exons?

More generally, can we define additional relationships that are not defined in the original GFF/GTF?

There are two different ways of "adding a relation", depending on what you want to do with it later:

  1. Add a relation in the attributes, without touching the database relations, by adding an entry like Parent="gene1" in the attributes for a feature. This will only affect the output of GFF lines, and will not allow child/parent queries via the gffutils API. So it's more like a cosmetic change.
  2. Add a relation to the database itself, by adding an entry in the relations table. On its own, this does not affect the output of GFF lines, since the info is not stored in the attributes field of a feature. But it does allow children/parents to be accessed from the db via the gffutils API.

For 1), this should just be left to the user to manipulate the attribute dictionaries however they want to.

For 2), this should be implemented as a new method on a FeatureDB -- say, FeatureDB.add_relation(). It should also [optionally] make the change in the attributes as in 1).

However . . .

There's the issue of database IDs, which are primary keys, vs the IDs that you'd want to include in the final output. The database ID needs to be added to the relations table, but you might want some other ID to be added to the attributes.

A concrete example, showing just attributes to save space:

ID="exon1"; fancy_id="exon:chr1:1-100"
ID="CDS1"; fancy_id="CDS:chr1:1-100"

Imagine we imported these features into a db using the idspec fancy_id, such that exon:chr1:1-100 and CDS:chr1:1-100 are the primary keys in the db. We want to make CDSs be children of exons. So we add an entry in the relations table, using the hypothetical FeatureDB.add_relation() method:

exon:chr1-100    CDS:chr1:1-100    1

But in the output -- and therefore the attributes as stored in the db -- we want to specify the parent in terms of another attribute, like ID. So when we print out the GFF lines, they look like this:

ID="exon1"; fancy_id="exon:chr1:1-100";
ID="CDS1"; fancy_id="CDS:chr1:1-100"; parent_exon="exon1";

This implies a method signature like:

FeatureDB.add_relation(
   child,
   parent,
   level, parent_func, child_func)

where parent_handler is a custom function with signature parent_func(parent, child) and returns the modified parent, and child_func(parent, child) returns the modified child.

In this example, parent_func would be lambda x: x and child_func would be:

def child_func(parent, child):
    child.attributes['parent_exon'] = parent.id
@daler
Copy link
Owner Author

daler commented Jan 2, 2014

By the way, keep in mind #6 when adding infrastructure for this issue.

@daler
Copy link
Owner Author

daler commented Jan 2, 2014

From #20:

In ENSEMBL, exons have different ENSE ids if they are of different frames. Ideally, the same exon locus exon:chr1:100-200:+:. would be a parent to both CDS:chr1:100-200:+:2 and CDS:chr1:100-150:+:0. So somehow the parent exon locus would have to match up with the different ENSE ids of the CDSs

Assuming the existence of FeatureDB.add_relation, would this work? :

db = FeatureDB('filename.db')
exon = db["exon:chr1:100-200:+:."]

# Get CDSs that fall within this exon
for cds in db.region(exon, featuretype='CDS', completely_within=True):

    # Maybe some filtering here to confirm that these are the 
    # CDSs you're looking for...

    def child_func(parent, child):
        child.attributes['parent_exon'] = parent['ENSE']
        return child

    db.add_relation(
        parent=exon, child=cds, level=1,
        parent_func=lambda x: x, child_func=child_func)

@olgabot
Copy link

olgabot commented Jan 2, 2014

Yes I think this would work!

@daler daler closed this as completed in 715a615 Jan 2, 2014
@daler
Copy link
Owner Author

daler commented Jan 2, 2014

As of 715a615 there's now an add_relation method. Can you give it a shot when you get a chance? Usage should be what I sketched out above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants