Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Fragment type to support more general "read-bucketing" #54

Closed
laserson opened this issue Apr 13, 2015 · 1 comment
Closed

Update Fragment type to support more general "read-bucketing" #54

laserson opened this issue Apr 13, 2015 · 1 comment
Milestone

Comments

@laserson
Copy link
Contributor

To support things like random barcodes, drop-seq, etc.

Followup to #44.

laserson added a commit to laserson/bdg-formats that referenced this issue May 5, 2015
What I think is an excellent idea is storing reads after a "groupby" operation
has occurred, as the reads will likely always be analyzed as a group.  Having
multiple reads from a single fragment of DNA is one such use case, but there
are others.  Droplet-seq is one that I am interested in.  Incorporating random
barcodes is another.

Here is a summary of my proposal, based in part on @ilveroluca's work:

* Support a fastq-like object `Sequence`, though I don't think this is strictly necessary.
* Rename to `Bucket`, as it sounds more general to my ears.
* Move run-specific or instrument-specific metadata into separate objects, as they don't necessarily make sense as top-level objects.
* Remove `fragmentSize`, as it's specific to one use case and it's rather easily computable.
* Support for multiple types of grouped objects.  What's the best way to deal with this?  `union` somehow?  I envision that we may add more types in the future that we'll want to persist as grouped objects.  At the moment, there is just a set of arrays for the type of objects that could be grouped.  This could be extended as we desire the ability to group other object types.
* Sequence and quality information from alignments should be retrieved from `AlignmentRecord`s.
* I don't think platform-specific information should be propagated through the entire chain of data types.  Why don't we include it in `Genotype`, then?  In my mind, any platform-specific analysis happens very early on, generally even before the fastq stage.  Therefore, I've moved platform-specific metadata into the `Sequence` object.

Fixes bigdatagenomics#54.
laserson added a commit to laserson/bdg-formats that referenced this issue May 5, 2015
What I think is an excellent idea is storing reads after a "groupby" operation
has occurred, as the reads will likely always be analyzed as a group.  Having
multiple reads from a single fragment of DNA is one such use case, but there
are others.  Droplet-seq is one that I am interested in.  Incorporating random
barcodes is another.

Here is a summary of my proposal, based in part on @ilveroluca's work:

* Support a fastq-like object `Sequence`, though I don't think this is strictly necessary.
* Rename to `Bucket`, as it sounds more general to my ears.
* Move run-specific or instrument-specific metadata into separate objects, as they don't necessarily make sense as top-level objects.
* Remove `fragmentSize`, as it's specific to one use case and it's rather easily computable.
* Support for multiple types of grouped objects.  What's the best way to deal with this?  `union` somehow?  I envision that we may add more types in the future that we'll want to persist as grouped objects.  At the moment, there is just a set of arrays for the type of objects that could be grouped.  This could be extended as we desire the ability to group other object types.
* Sequence and quality information from alignments should be retrieved from `AlignmentRecord`s.
* I don't think platform-specific information should be propagated through the entire chain of data types.  Why don't we include it in `Genotype`, then?  In my mind, any platform-specific analysis happens very early on, generally even before the fastq stage.  Therefore, I've moved platform-specific metadata into the `Sequence` object.

Fixes bigdatagenomics#54.
laserson added a commit to laserson/bdg-formats that referenced this issue Sep 14, 2015
What I think is an excellent idea is storing reads after a "groupby" operation
has occurred, as the reads will likely always be analyzed as a group.  Having
multiple reads from a single fragment of DNA is one such use case, but there
are others.  Droplet-seq is one that I am interested in.  Incorporating random
barcodes is another.

Here is a summary of my proposal, based in part on @ilveroluca's work:

* Support a fastq-like object `Sequence`, though I don't think this is strictly necessary.
* Rename to `Bucket`, as it sounds more general to my ears.
* Move run-specific or instrument-specific metadata into separate objects, as they don't necessarily make sense as top-level objects.
* Remove `fragmentSize`, as it's specific to one use case and it's rather easily computable.
* Support for multiple types of grouped objects.  What's the best way to deal with this?  `union` somehow?  I envision that we may add more types in the future that we'll want to persist as grouped objects.  At the moment, there is just a set of arrays for the type of objects that could be grouped.  This could be extended as we desire the ability to group other object types.
* Sequence and quality information from alignments should be retrieved from `AlignmentRecord`s.
* I don't think platform-specific information should be propagated through the entire chain of data types.  Why don't we include it in `Genotype`, then?  In my mind, any platform-specific analysis happens very early on, generally even before the fastq stage.  Therefore, I've moved platform-specific metadata into the `Sequence` object.

Fixes bigdatagenomics#54.
@heuermh heuermh added this to the 0.14.0 milestone Jul 2, 2019
@heuermh
Copy link
Member

heuermh commented Jul 2, 2019

Closing as WontFix

@heuermh heuermh closed this as completed Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants