Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to RDD utility files for new variant schema #114

Merged
merged 5 commits into from
Feb 20, 2014
Merged

Changes to RDD utility files for new variant schema #114

merged 5 commits into from
Feb 20, 2014

Conversation

nealsid
Copy link

@nealsid nealsid commented Feb 18, 2014

After this I will work on cleaning the build.

@@ -151,15 +145,14 @@ class AdamRecordRDDFunctions(rdd: RDD[ADAMRecord]) extends Serializable with Log
* @param r Read to map.
* @return List containing one or two mapping key/value pairs.
*/
def mapToBucket (r: ADAMRecord): List[(ReferencePosition, ADAMRecord)] = {
def mapToBucket (r: ADAMRecord): List[(Long, ADAMRecord)] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes intended? It looks like they revert to older code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; was that intentional? If not, this is the other cause of the build failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/130/

@@ -31,7 +31,7 @@ class ParquetFileTraversable[T <: IndexedRecord](sc: SparkContext, file: Path) e
}
val status = fs.getFileStatus(file)
var paths = List[Path]()
if (status.isDir) {
if (status.isDirectory) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a heads up, this is not reverse compatible with Hadoop 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be discussed in some detail.

At the least, this change should also add some text into the POM saying that only 2.2 is supported (or equivalent).

For us here, this won't affect us. However, I feel like we could get by with waiting until 0.7.0 for this change, or possibly even a 0.6.2 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't affect us here at Sinai, either. Do we have anyone using Hadoop 1.x?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for removing the comment about pre-2.2 from the POM instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been running Hadoop 1.0.4. I'll see if we can move to Hadoop 2 for all our work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, we're fine with Hadoop 2. Let's just make the move then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. Why would be break Hadoop 1 compatibility if we can easily avoid it?

Can we just us "status.isDir" here and then open a discussion with the broader group on the mailing list? If everyone is ok with dropping Hadoop 1.x support, that's fine. We shouldn't decide that here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change.

@fnothaft
Copy link
Member

Looks good to me @nealsid !

@fnothaft
Copy link
Member

Do you want this to merge in now, or want to wait for more reviews?

@tdanford
Copy link
Contributor

Wait a few more hours please...

On Tue, Feb 18, 2014 at 5:09 PM, Frank Austin Nothaft
notifications@github.com wrote:

Do you want this to merge in now, or want to wait for more reviews?

Reply to this email directly or view it on GitHub:
#114 (comment)

@@ -0,0 +1,56 @@
/*
* Copyright (c) 2013. Mount Sinai School of Medicine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be updated to 2014

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/134/

@carlyeks
Copy link
Member

@nealsid: Is this ready to be merged, or did you want to address your comment first?

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/142/

@nealsid
Copy link
Author

nealsid commented Feb 20, 2014

It's ready to merge, sorry, I should have clarified that.

fnothaft added a commit that referenced this pull request Feb 20, 2014
Changes to RDD utility files for new variant schema
@fnothaft fnothaft merged commit 37bb148 into bigdatagenomics:vcf-work Feb 20, 2014
@fnothaft
Copy link
Member

Thanks @nealsid! Merged.

@nealsid nealsid deleted the vcf-work-rdd branch February 20, 2014 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants