Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AlignmentRecordConverter public so that it can be used from other projects #1157

Conversation

tomwhite
Copy link
Member

@tomwhite tomwhite commented Sep 9, 2016

AlignmentRecordConverter used to be a public class, but was made private in 9764b2c. Unfortunately, this causes a problem for GATK, which uses AlignmentRecordConverter to read ADAM formatted files. This PR reinstates public visibility. There are other converters in the same package that are private that could be changed back, but I haven't done that in this PR.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1476/
Test PASSed.

@fnothaft
Copy link
Member

fnothaft commented Sep 9, 2016

Hi @tomwhite! Thanks for sending along the patch! It'd be my preference to keep the AlignmentRecordConverter class private (so that we can change underlying interfaces, etc, without breaking user code), so I'm wondering if there might be another way that we could expose this functionality. Are you all looking for the ability to convert an RDD of SAMRecord(Writable) to AlignmentRecords? If so, I think it would be preferable to expose this conversion from inside of the org.bdgenomics.adam.rdd package. I'll take a gander over the GATK code and see what I find. That being said, if there's no way around making AlignmentRecordConverter public, we'll go with that route.

@heuermh
Copy link
Member

heuermh commented Sep 12, 2016

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail. Thus I'm in agreement that public APIs for conversion at the level of RDDs seem more appropriate.

@tomwhite
Copy link
Member Author

AlignmentRecord is still a public class though, so it would be useful to have a utility to convert to and from SAMRecord. SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not. Perhaps the private methods on AlignmentRecordConverter could be moved elsewhere?

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

@fnothaft
Copy link
Member

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail.

I would strongly disagree here; I still very much so see the schemas as an interchange format.

SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not.

TBH, this was just an oversight. Both of them were meant to be moved to private when I went through and cleaned up this package a bit ago.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

I think these are two reasonable points. I am OK with merging this PR to resolve them. That being said, what I might want to do to follow up is to:

  • Do a review pass and make some of the AlignmentRecordConverter methods private
  • Make SAMFileHeaderWritable private to ADAM, and change the AlignmentRecordConverter.convert signature to take a SAMFileHeader

I'm going to leave this open for another 24hrs, and unless there is a -1 I will merge and create a ticket for my two bullet points.

@fnothaft
Copy link
Member

I didn't hear any more peeps, so I've merged this as fd2c27b. Thanks, @tomwhite!

@fnothaft fnothaft closed this Sep 13, 2016
@heuermh
Copy link
Member

heuermh commented Sep 13, 2016

@fnothaft please do create a follow up issue as you describe above.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark.

If that were the case, it would be preferable to me to have the conversion stuff written in Java instead of Scala and be a library external to ADAM.

It appears though that there are different ideas of what our bdg-formats are for. :)

@tomwhite
Copy link
Member Author

Thanks @fnothaft! I agree with your suggested cleanups.

@tomwhite tomwhite deleted the public_alignment_record_converter branch September 15, 2016 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants