replace GARead with GAFlattenedAlignment #47

lh3 · 2014-05-16T16:44:19Z

Major difference: a GARead may consist of multiple linear alignments but a GAFlattenedAlignment describes at most one linear alignment and is equivalent to a SAM line. Also extend to allow >2 reads per fragment.

This PR is the opposite of #38. It stuffs fragment, read and alignment attributes in one object.

PS: I don't like name "GAFlattenedAlignment". Open to better ones.

Major difference: a GARead may consist of multiple linear alignment but a GAFlattenedAlignment describes at most one linear alignment and is equivalent to a SAM line. Also extend to allow >2 reads per fragment.

... because there is no GARead any more.

dglazer · 2014-05-16T17:39:44Z

Interesting -- thanks for exploring the opposite end of the design spectrum. This addresses my concern about too many moving parts, and it keeps the method signature nice and clean. But of course there are tradeoffs. Two initial comments:

major: to implement range search efficiently, I think the backend has to be indexed by linear alignment (since those are the things that have coordinates), and wants to store all LA's that are near each other in the genome near each other on disk. But then before returning results, the backend has to find all the related otherAlignments[] and read their data into the response. We can check with folks who have built these backends, but I'm worried those goals are in tension.
minor: what is fragmentId? I don't see it used in any methods, or any other objects

lh3 · 2014-05-16T18:19:17Z

to implement range search efficiently, I think the backend has to be indexed by linear alignment (since those are the things that have coordinates), and wants to store all LA's that are near each other in the genome near each other on disk. But then before returning results, the backend has to find all the related otherAlignments[] and read their data into the response. We can check with folks who have built these backends, but I'm worried those goals are in tension.

For efficient implementation, we'd better duplicate information. In SAM, the position of a read in a pair appears twice, in the self record and in its mate record. This way we don't need to seek to the mate to get the basic mate information. Similarly, SAM has an SA tag to keep the other alignments of the same read to avoid seeking. Sequence and quality are also partially duplicated for a chimeric alignment consists of multiple linear alignments.

The downside of this approach is obvious: it increases file/data size, potentially leads to inconsistencies across records, and only allows to retrieve duplicated information efficiently. I think we should not require all information to be present. An implementation may choose to fill the available fields and set the rest to missing values. It is up to the user to make further decision.

Generally, there are no perfect solutions. One advantage of this PR is that it is very close to the current practice with the SAM/BAM format.

Make Maven enforce minimum Maven version (3.3.3) and Java version (JDK 1.8)

lh3 added 3 commits May 16, 2014 12:38

replace GARead with GAFlattenedAlignment

f0d451b

Major difference: a GARead may consist of multiple linear alignment but a GAFlattenedAlignment describes at most one linear alignment and is equivalent to a SAM line. Also extend to allow >2 reads per fragment.

keep the number of reads in a fragment

acb890a

fixed a syntax error

bc3c6d0

lh3 added the ReadTaskTeam label May 16, 2014

fixed compiling errors in readmethods.avdl

01e8ab6

... because there is no GARead any more.

cassiedoll mentioned this pull request May 17, 2014

Alternative fragment proposal - read field reorganization #51

Closed

massie closed this May 19, 2014

massie deleted the flattenedAlignment branch May 19, 2014 19:39

lh3 mentioned this pull request May 21, 2014

Replace GARead with GAReadAlignment #60

Merged

pgrosu mentioned this pull request Mar 13, 2015

look up all reads in fragment #212

Open

dcolligan pushed a commit to dcolligan/ga4gh-schemas that referenced this pull request Jul 20, 2016

Merge pull request ga4gh#47 from hjellinek/compliance_redux

f587d61

Make Maven enforce minimum Maven version (3.3.3) and Java version (JDK 1.8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace GARead with GAFlattenedAlignment #47

replace GARead with GAFlattenedAlignment #47

lh3 commented May 16, 2014

dglazer commented May 16, 2014

lh3 commented May 16, 2014

replace GARead with GAFlattenedAlignment #47

replace GARead with GAFlattenedAlignment #47

Conversation

lh3 commented May 16, 2014

dglazer commented May 16, 2014

lh3 commented May 16, 2014