Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change fuzzy position operators for RHS clipped features #847

Merged
merged 2 commits into from
Jul 16, 2018

Conversation

tomkinsc
Copy link
Member

this reverses changes from:
1271307#diff-1fff82e70c2795767d937ffd93ccc616

Copy link
Member

@dpark01 dpark01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... I understand the thinking, but I'll note that for me, tbl2asn errors with the following:

[tbl2asn 25.6] Bad feature table at line 9 of file LASV_NGA_2018_1392-2.tbl
[tbl2asn 25.6] Bad feature table at line 8 of file LASV_NGA_2018_0026-1.tbl
[tbl2asn 25.6] Bad feature table at line 9 of file LASV_NGA_2018_0998-2.tbl
[tbl2asn 25.6] Bad feature table at line 9 of file LASV_NGA_2018_0998-1.tbl

I ran it as tbl2asn -t authors-nga_lasv.sbt -p . -X C -j "organism=Lassa mammarenavirus mol_type=cRNA" and I can provide you the sbt and cmt files offline for testing.

@dpark01
Copy link
Member

dpark01 commented Jul 16, 2018

testfiles.zip

To reproduce: unpack this zip file into an empty directory, copy in all of the tbl files from viral-ngs/test/input/TestFeatureTransfer/lasv/expected (from the ct-feature-clipping-fix branch), and run tbl2asn -t authors-nga_lasv.sbt -p . -X C -j "organism=Lassa mammarenavirus mol_type=cRNA"

@dpark01
Copy link
Member

dpark01 commented Jul 16, 2018

Okay according to the fine print in the documentation:

Locations of partial (incomplete) features are indicated with a ">" or "<" next to the number. In this example, the first gene, CDS, and mRNA all begin upstream of the start of the nucleotide sequence. The "<" symbol indicates that they are 5' partial features. Furthermore, for the protein to translate correctly, the correct reading frame must be indicated with the qualifier "codon_start" on the first CDS. There is no need to indicate the codon_start on complete CDSs, as it is assumed that the translation starts at the first nucleotide of the interval if no codon_start is provided.

So this implies that < does not mean "to the left of" (ie coordinates that are lower integers than this one). It means "5' partial feature" (even if it's the right end of the feature). So the original encoding was correct.

@tomkinsc tomkinsc merged commit 6cb945e into master Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants