Move/rename some stray Spark datasource classes, and delete BaseRecalibratorSparkSharded #5192

droazen · 2018-09-14T20:38:34Z

-Move the Spark reference datasource classes from the engine.datasources package into the
engine.spark.datasources package, and rename them to make it clear that they are for use
on Spark. This fixes a longstanding problem where they were getting confused with the
walker ReferenceDataSource/ReferenceFileSource classes.

-Delete the unused/unmaintained experimental tool BaseRecalibratorSparkSharded, which has
fallen out-of-date relative to BaseRecalibratorSpark, as well as its unused companion AddContextDataToReadsSparkOptimized.

-Delete an extra "VariantSource" class that is now unused (note: this is not the same as
VariantSparkSource, which is used extensively and retained here)

…ibratorSparkSharded -Move the Spark reference datasource classes from the engine.datasources package into the engine.spark.datasources package, and rename them to make it clear that they are for use on Spark. This fixes a longstanding problem where they were getting confused with the walker ReferenceDataSource class. -Delete the unused/unmaintained experimental tool BaseRecalibratorSparkSharded, which has fallen out-of-date relative to BaseRecalibratorSpark. -Delete an extra "VariantSource" class that is now unused (note: this is not the same as VariantSparkSource, which is used extensively and retained here)

droazen · 2018-09-14T20:38:51Z

@jamesemery please review

jamesemery

I suggest you move the non-spark datasources into org.broadinstitute.hellbender.engine -> org.broadinstitute.hellbender.engine.datasources for consistency sake.

Also AddContextDataToReadSpark is orphaned

jamesemery · 2018-09-14T20:50:47Z

...n/java/org/broadinstitute/hellbender/engine/spark/datasources/ReferenceMultiSparkSource.java

@@ -21,33 +20,33 @@
 *
 * This class needs to subclassed by test code, so it cannot be declared final.
 */
-public class ReferenceMultiSource implements ReferenceSource, Serializable {
+public class ReferenceMultiSparkSource implements ReferenceSparkSource, Serializable {


I don't like this name, this should really be ReferenceSparkSource as the multi is confusing (are there multiple sparks?"

I don't like it either, but ReferenceSparkSource is taken already as the interface name. I think it's better to have a consistent naming convention across these classes for now. We can improve the names if we ever merge the walker and Spark datasource classes.

jamesemery · 2018-09-14T20:54:21Z

...java/org/broadinstitute/hellbender/engine/spark/datasources/VariantsSparkSourceUnitTest.java


        Assert.assertTrue(CollectionUtils.isEqualCollection(rddParallelVariants.collect(), variantsList));
    }

+    /**
+     * getVariantsListAs grabs the variants from local files (or perhaps eventually buckets), applies


Do we really need javadocs on a private method used by a handful of tests in this class?

Not really, but it doesn't hurt.

jamesemery · 2018-09-14T20:59:44Z

src/main/java/org/broadinstitute/hellbender/engine/spark/datasources/ReferenceSparkSource.java

@@ -10,15 +9,15 @@
 /**
 * Internal interface to load a reference sequence.
 */
-public interface ReferenceSource {
+public interface ReferenceSparkSource {


SparkReferenceSource or ReferenceDataSparkSource

These seem inconsistent with the convention already in use across the engine.spark.datasources package. Again, I think it's better to stick to a consistent naming convention for now, since the goal of this PR is to disambiguate these classes vs. their walker counterparts.

* Move/rename ReferenceMultiSparkSourceUnitTest * Delete unused AddContextDataToReadSparkOptimized

droazen · 2018-09-14T21:27:06Z

Added a second commit to delete the now-unused AddContextDataToReadSparkOptimized

* AddContextDataToReadSpark (and AddContextDataToReadSparkUnitTest) implemented the different JoinStrategy options for BQSR; has been replaced with the Spark Files mecahnism (see #5127) * BroadcastJoinReadsWithRefBases and JoinReadsWithRefBasesSparkUnitTest were only used by AddContextDataToReadSpark * BroadcastJoinReadsWithVariants and JoinReadsWithVariantsSparkUnitTest were only used by AddContextDataToReadSpark * ShuffleJoinReadsWithRefBases and ShuffleJoinReadsWithVariants were only used by AddContextDataToReadSpark * JoinStrategy was only used for BQSR (HC always uses overlaps partitioner), but is no longer used since #5127 * KnownSitesCache was replaced with Spark Files * ReferenceMultiSourceAdapter in HaplotypeCallerSpark was replaced with the regular ReferenceDataSource * BaseRecalibratorEngineSparkWrapper was only used by BaseRecalibratorSparkSharded, which was removed in #5192

droazen requested a review from jamesemery September 14, 2018 20:38

droazen assigned jamesemery Sep 14, 2018

jamesemery reviewed Sep 14, 2018

View reviewed changes

Additional changes

efdb6f4

* Move/rename ReferenceMultiSparkSourceUnitTest * Delete unused AddContextDataToReadSparkOptimized

droazen assigned droazen and unassigned jamesemery Sep 14, 2018

droazen merged commit c935dc7 into master Sep 14, 2018

droazen deleted the dr_move_spark_datasources branch September 14, 2018 22:45

tomwhite mentioned this pull request Oct 8, 2018

Remove unused classes following #5127. #5292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move/rename some stray Spark datasource classes, and delete BaseRecalibratorSparkSharded #5192

Move/rename some stray Spark datasource classes, and delete BaseRecalibratorSparkSharded #5192

droazen commented Sep 14, 2018 •

edited

Loading

droazen commented Sep 14, 2018

jamesemery left a comment

jamesemery Sep 14, 2018

droazen Sep 14, 2018

jamesemery Sep 14, 2018

droazen Sep 14, 2018

jamesemery Sep 14, 2018

droazen Sep 14, 2018 •

edited

Loading

droazen commented Sep 14, 2018

Move/rename some stray Spark datasource classes, and delete BaseRecalibratorSparkSharded #5192

Move/rename some stray Spark datasource classes, and delete BaseRecalibratorSparkSharded #5192

Conversation

droazen commented Sep 14, 2018 • edited Loading

droazen commented Sep 14, 2018

jamesemery left a comment

Choose a reason for hiding this comment

jamesemery Sep 14, 2018

Choose a reason for hiding this comment

droazen Sep 14, 2018

Choose a reason for hiding this comment

jamesemery Sep 14, 2018

Choose a reason for hiding this comment

droazen Sep 14, 2018

Choose a reason for hiding this comment

jamesemery Sep 14, 2018

Choose a reason for hiding this comment

droazen Sep 14, 2018 • edited Loading

Choose a reason for hiding this comment

droazen commented Sep 14, 2018

droazen commented Sep 14, 2018 •

edited

Loading

droazen Sep 14, 2018 •

edited

Loading