block compressed interval streams #7142

tedsharpe · 2021-03-11T21:54:45Z

No description provided.

mwalker174

Thank you @tedsharpe this is looking great. Before I do a deep dive on this, I was having trouble getting some basic test cases to work with the existing tools. Can you take a shot at my comments below? I realize the Feature class design might make some of this complicated to implement. If headerless files aren't feasible we could look into forking gatk right now so we can use LocalizeSVEvidence in the current gatk-sv pipeline.

mwalker174 · 2021-03-23T19:07:36Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureDataSource.java

@@ -286,12 +290,13 @@ public FeatureDataSource(final FeatureInput<T> featureInput, final int queryLook
                BucketUtils.getPrefetchingWrapper(cloudIndexPrefetchBuffer),
                genomicsDBOptions);

-        if (IOUtils.isGenomicsDBPath(featureInput)) {
+        if (IOUtils.isGenomicsDBPath(featureInput) ||
+                featureInput.getFeaturePath().endsWith(BCI_FILE_EXTENSION)) {


Let's make all extension checks case-insensitive (also in the *BCICodec classes)

mwalker174 · 2021-03-24T17:42:25Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+    private static BufferedWriter createOutputFile( final String fileName ) {
+        final GATKPath path = new GATKPath(fileName);
+        if ( fileName.endsWith(".gz") ) {
+            return new BufferedWriter(
+                    new OutputStreamWriter(
+                            new BlockCompressedOutputStream(path.getOutputStream(), path.toPath(), 6)));
        }
+        return new BufferedWriter(new OutputStreamWriter(path.getOutputStream()));


Let's go ahead and add bci output functionality here

mwalker174 · 2021-03-24T17:46:03Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+    }
+
+    @VisibleForTesting
+    public final static class LocusDepth implements Feature {


Can we make this a public external class? That way other tools can use it, just like with SplitReadEvidence, DiscordantPairEvidence, etc.

mwalker174 · 2021-03-24T17:48:52Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+        private final int refIdx;
+        private final int altIdx;


Add comment that these are the ordinal values from Nucleotide

mwalker174 · 2021-03-24T17:51:16Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+        }
+
+        public String toString() {
+            return getContig() + "\t" + getStart() + "\t" + "ACGT".charAt(refIdx) + "\t" +


Maybe change "ACGT".charAt(refIdx) -> Nucleotide.values()[refIdx]? In the unlikely event that the ordering of Nucleotide changes.

mwalker174 · 2021-03-24T17:51:51Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+
+        public String toString() {
+            return getContig() + "\t" + getStart() + "\t" + "ACGT".charAt(refIdx) + "\t" +
+                    "ACGT".charAt(altIdx) + "\t" + totalDepth + "\t" + altDepth;


mwalker174 · 2021-03-24T18:07:06Z

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

+        }
+
+        private void writeDictionary( final SAMSequenceDictionary dictionary ) throws IOException {
+            dos.writeInt(dictionary.size());


When converting an evidence file from text to bci format using PrintSVEvidence, dictionary is null here, and I am also getting null errors for the sample metadata. Can you add an optional header to the text formats?

For the current version of the pipeline, we still need to be able to consume a headerless text format with PrintSVEvidence (writing to headerless text). For the next version, we will require headers. In other words, we need to have the following conversion functionality:

headered text -> headered text, bci

unheadered text -> unheadered text

bci -> headered text, bci

Let's leave the text files alone, and not create versions with headers. To make PrintSVEvidence work with headerless text files, you simply supply a reference with -R.

mwalker174 · 2021-03-24T18:11:46Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/PrintAlleleCounts.java

+        oneLineSummary = "Prints allele counts.",
+        programGroup = StructuralVariantDiscoveryProgramGroup.class
+)
+public class PrintAlleleCounts extends FeatureWalker<LocusDepth> {


Can't remember, but I may have asked for this and wasn't clear about what I meant. Instead of this tool, can you add functionality to PrintSVEvidence to convert between locus depth bci and text formats? (consistent with the other evidence types)

mwalker174 · 2021-03-24T18:21:14Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+        final String strandA = r.isReadReverseStrand() ? "-" : "+";
+        final String strandB = r.isMateReverseStrand() ? "-" : "+";

-        final DiscordantPairEvidence discordantPair = new DiscordantPairEvidence(sampleName, r.getContig(), r.getStart(), strandA, r.getMateContig(), r.getMateStart(), strandB);
-        peWriter.add(discordantPair);
+        try {
+            // subtract 1 from positions to match pysam output
+            peWriter.write(r.getContig() + "\t" + (r.getStart() - 1) + "\t" + strandA + "\t" + r.getMateContig() + "\t" + (r.getMateStart() - 1) + "\t" + strandB + "\t" + sampleName + "\n");
+        } catch (IOException e) {
+            throw new GATKException("Could not write to PE file", e);
+        }


This looks like recently changed code from commit b68eb8778d65f84c46f2e1fe44683e861bfd67d2 on master (and elsewhere in this file). Can you check for accidental reversions?

lbergelson · 2021-05-12T18:21:11Z

could look into forking gatk

This seems problematic. How can we avoid this?

lbergelson

@tedsharpe This is cool and useful. Self indexing files are great. I think there might be a few issues with the changes to FeatureWalker but I don't remember the details of why it was set up the way it was so I'm summoning @droazen who probably does. I also despair at the introduction of yet another copy of IntervalTree (and the addition of yet another interval class). Can we avoid this? Should we refactor the IntervalTree class in htsjdk to allow it to be parameterized with a comparator? It would be useful if the new classes had some javadoc explaining how they were different from the existing similarly named classes.

lbergelson · 2021-05-12T18:22:22Z

src/main/java/org/broadinstitute/hellbender/utils/CollatingInterval.java

+import java.io.IOException;
+import java.util.function.Supplier;
+
+public class CollatingInterval implements Feature, Comparable<CollatingInterval> {


These classes need javadoc explaining what these are and how they're different from existing classes of similar name/function.

Am I correct in my understanding that this is just an interval that uses the contig order in the sequence dictionary to sort? We do already have IntervalUtils.getDictionaryOrderComparator() to do this sort of thing. Do we really need yet another interval class?

I've backed all but a couple of trivial changes out of FeatureWalker (as you'll see at the next check-in).

Sorry to introduce another copy of IntervalTree. But this is the good one. Let's get rid of the others. ;-)

lbergelson · 2021-05-12T18:35:58Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureWalker.java

-            drivingFeatures = new FeatureDataSource<>(new FeatureInput<>(drivingPath), FeatureDataSource.DEFAULT_QUERY_LOOKAHEAD_BASES, null, cloudPrefetchBuffer, cloudIndexPrefetchBuffer, referenceArguments.getReferencePath());
-
-            final FeatureInput<F> drivingFeaturesInput = new FeatureInput<>(drivingPath, "drivingFeatureFile");
+            drivingFeaturesInput = new FeatureInput<>(drivingPath, "drivingFeatureFile");


Does this no longer respect cloud settings and pass through a reference?

lbergelson · 2021-05-12T18:38:59Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureWalker.java

-                    progressMeter.update(feature);
-                });
+        final Iterator<F> featureItr =
+                features.getFeatureIterator(drivingFeaturesInput, userIntervals);


It seems like this change breaks some existing features like setting the appropriate caching behavior and cloud lookahead settings.

I believe there was an explicit reason to use a separate FeatureDataSource for the driving features instead of the one in feature manager but I don't remember what the issue is exactly. @droazen can you comment?

@lbergelson I've added a comment in a separate review below.

lbergelson · 2021-05-12T19:26:35Z

src/main/java/org/broadinstitute/hellbender/utils/collections/IntervalTree.java

+import java.util.NoSuchElementException;
+import java.util.Objects;
+
+/**


It's depressing that we need a 3rd copy of this code. Can we abstract it slightly so we can have 1 instead of 3?

lbergelson · 2021-05-12T19:33:36Z

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

+        IntervalTree<Long> index;
+        boolean usedByIterator;
+
+        public Reader( final FeatureInput<T> inputDescriptor, final FeatureCodec<T, Reader<T>> codec ) {


It would be nice if these could all route through 1 copy of the setup code instead of 3 slight variants but maybe that's not feasible.

lbergelson · 2021-05-12T19:34:40Z

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

+            this.codec = codec;
+            final SeekableStream ss;
+            try {
+                ss = SeekableStreamFactory.getInstance().getStreamFor(path);


@cmnbroad HtsPath -> string -> SeekableStream seems very roundabout. Do we have a preferred way of doing this? I didn't see one other than HtsStream -> Path -> SeekablePathStream.

tedsharpe · 2021-05-12T19:52:19Z

@lbergelson I think the idea of the BlockCompressedIntervalStream is a sound one, and will be quite useful. The code around it -- trying to integrate it with the Feature system -- is a stinky mess, and I'm trying to clean it up.
I'm close to having a major update. Let's think about it more at that point.

lbergelson · 2021-05-12T20:02:27Z

@tedsharpe Yes, the Feature / codec system is indeed a stinky mess.

gatk-bot · 2021-05-13T20:49:39Z

Travis reported job failures from build 34176
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
integration	openjdk11	34176.12	logs
integration	openjdk8	34176.2	logs

droazen

@tedsharpe I've commented on the FeatureWalker changes, as requested. I think we'll need to adjust them slightly, but I'm sure we can work something out that won't be cumbersome. I just need some more info from you before I can give you my suggestion.

droazen · 2021-05-14T15:26:57Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureWalker.java

@@ -20,7 +21,7 @@
 */
 public abstract class FeatureWalker<F extends Feature> extends WalkerBase {

-    private FeatureDataSource<F> drivingFeatures;
+    private FeatureInput<F> drivingFeaturesInput;


@tedsharpe @lbergelson The "historical reason" why we create a special separate additional FeatureDataSource for the driving input is twofold:

There are some tools that need to view the records from the primary input in two ways simultaneously: record-by-record as part of the main traversal, and via windowed queries around the current locus against the FeatureContext. This is why we create both a standalone drivingFeatures data source, an also add an additional FeatureInput for the driving features to features (so it will also show up in the FeatureContext)

To get reasonable performance in the above use case, we need different caching settings for the drivingFeatures for the main traversal vs. the FeatureInput added to the feature sources.

Now, that's the historical reason. The truth is that this only affects certain VariantWalkers (and if you read the source for VariantWalker, you'll find some comments there on this issue). But FeatureWalker was intended to be a generalization of VariantWalker, so it would be good if we could keep them consistent if possible.

@tedsharpe Could you briefly explain why you needed to make these changes to FeatureWalker? Once I understand your goals here better, I can suggest a way forward. I'm sure we can easily work something out that would require minimal changes on your end!

@droazen I've been able to back out almost all my changes to FeatureWalker. Once I saw that the Tabix reader implemented the famous fake-iterable pattern, I understood what you documented above.
I'm going to rip through the engine classes that I've changed and see what else I can revert.

tedsharpe · 2021-05-14T18:18:33Z

@mwalker174 I think I've addressed almost all of your comments. I have one little piece of work to go, as I mentioned at standup.

tedsharpe · 2021-05-14T18:24:27Z

@lbergelson I'm going to try to convince you that implementing Intervals (and therefore Features) by holding a SAMSequenceRecord rather than a String contig name is a really good idea. Comparability. Speed. Memory use. More to follow...

lbergelson · 2021-05-17T20:05:54Z

@tedsharpe You make good points in theory, but in practice we intern the chromosome names so we only ever end up holding 1 copy of each string and comparison mostly short circuits to object comparison. Previous iterations held a contig index instead, which was a horribly bug ridden mess. Your solution seems fine but I'm not sure you're actually going to see much in the way of benefits practically. It potentially introduces some new potential issues when dealing with intervals that come from different sources with slightly different references that still represent the same data.

lbergelson · 2021-05-17T20:06:42Z

If you show me that you're code actually wastes a lot of time and effort on contig name comparison I'll be happy to eat my words...

gatk-bot · 2021-05-18T17:09:00Z

Travis reported job failures from build 34203
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
integration	openjdk11	34203.12	logs
integration	openjdk8	34203.2	logs

tedsharpe · 2021-05-18T17:46:40Z

CollatingInterval does the equivalent of intern() without the overhead: It always returns the dictionary's copy of the contig name. (And it does this perforce. I think we're actually pretty inconsistent about interning.)

I've tested sorting (which I would expect to be dominated by the cost of interval comparison) with 1) CollatingIntervals, and then, using the DictionaryOrderComparator, 2) SimpleIntervals with effectively interned contig Strings, and 3) SimpleIntervals with novel Strings. CollatingInterval is (disappointingly to me) only twice as fast with interned Strings, and three times as fast with novel Strings on my pathetically ancient Ubuntu machine. So is 2x and 3x "a lot of time and effort"?

Seems to me like the potential issues with slightly different references are already a can of worms, and I don't see that this creates new worms.

You know what? It really isn't worth the struggle. It seemed kind of cool and innovative to me, but nobody but me seems to find it compelling. I'll get rid of it all.

lbergelson · 2021-05-18T17:51:01Z

@tedsharpe It's not a bad idea. I'm just very resistant to change... 2x speedup is significant.

It sort of just has different worms than the existing solution does which might catch people out who aren't expecting them.

tedsharpe · 2021-05-18T18:24:54Z

I'm supposed to be providing block-compressed streams for interval-oriented data. This is a side-trip. I'll rip it back to the bare minimum and we can have another look.
Thanks very much for your time in reviewing.

tedsharpe · 2021-05-24T19:56:54Z

@lbergelson @mwalker174 I think this is ready for a re-review, if you could, please.

gatk-bot · 2021-05-24T20:27:48Z

Travis reported job failures from build 34279
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
integration	openjdk11	34279.12	logs
integration	openjdk8	34279.2	logs

gatk-bot · 2021-05-25T18:38:43Z

Travis reported job failures from build 34296
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
integration	openjdk11	34296.12	logs
integration	openjdk8	34296.2	logs

mwalker174

Thank you @tedsharpe this is looking great. I have comments mainly pertaining to documentation and usability. Also what would you think about moving some of the SV utilities (SVIntervalTree, SVInterval) to the engine/htsjdk? I assume they perform better than using OverlapDetector/IntervalTree and SimpleInterval.

mwalker174 · 2021-06-03T15:41:38Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

@@ -69,6 +73,10 @@
    public static final String PAIRED_END_FILE_ARGUMENT_LONG_NAME = "pe-file";
    public static final String SPLIT_READ_FILE_ARGUMENT_SHORT_NAME = "SR";
    public static final String SPLIT_READ_FILE_ARGUMENT_LONG_NAME = "sr-file";
+    public static final String ALLELE_COUNT_OUTPUT_ARGUMENT_SHORT_NAME = "AC";


Need to add tool documentation for this above

I think we need to rename the tool since it does AC as well now - I would suggest CollectSVEvidence. @cwhelan any opinions on that?

mwalker174 · 2021-06-03T15:42:59Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+        final List<String> sampleNames = Collections.singletonList(sampleName);
+        peWriter = createPEWriter();
+        srWriter = createSRWriter();
+        if ( alleleCountInputFilename != null && alleleCountOutputFilename != null ) {


Should throw an exception if one is specified but not the other.

mwalker174 · 2021-06-03T16:38:52Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintSVEvidence.java

-    private FeatureOutputStream<DepthEvidence> rdStream;
-    private FeatureCodec<? extends Feature, ?> featureCodec;
-    private Class<? extends Feature> evidenceClass;
+    @Argument(doc = "List of sample names", fullName = "sample-names", optional = true)


Can you make this into a sample list file? Also need documentation for when this is required (seems to be when ingesting a headerless feature file)

If the argument is of type "List", Barclay will automatically read a file named "*.list" to populate it. Now documented that it's necessary when input is tab-delimited text, except for read depth.

mwalker174 · 2021-06-03T16:41:18Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintSVEvidence.java

@@ -59,7 +58,7 @@
 )
 @ExperimentalFeature
 @DocumentedFeature
-public final class PrintSVEvidence extends FeatureWalker<Feature> {
+public final class PrintSVEvidence <F extends Feature> extends FeatureWalker<F> {


Will need documentation about what inputs are required when the input feature files don't have headers

mwalker174 · 2021-06-07T19:12:08Z

.../org/broadinstitute/hellbender/tools/walkers/sv/PairedEndAndSplitReadEvidenceCollection.java

+                throw new UserException("vcf contains a SNP with a non-standard reference base " +
+                        refCall + " at locus " + snp.getContig() + ":" + snp.getStart());
+            }
+            final byte[] altSeq = snp.getAlternateAllele(0).getBases();


We only want to collect bi-allelic sites for gatk-sv, so at a minimum multi-allelics should be skipped. It would be great, if it's not too much trouble, to have the option to collect multi-alelics on the allele with the highest count (as with the CN tools) to allow for some experimentation.

Omitting multi-allelics for now.
Would you like me to change the output format to always show counts for A, C, G, and T? I.e., change columns from
contig, position, refAllele, altAllele, totalCount, altCount
to
contig, position, refAllele, aCount, cCount, gCount, tCount

Actually that might be a very good solution. Any file size or performance hits for that? Let's discuss at standup.

mwalker174 · 2021-06-28T19:14:01Z

src/main/java/org/broadinstitute/hellbender/utils/codecs/DepthEvidenceBCICodec.java

+        if ( !versionChecked ) {
+            if ( !DepthEvidence.BCI_VERSION.equals(reader.getVersion()) ) {
+                throw new UserException("baf.bci file has wrong version: expected " +
+                        DepthEvidence.BCI_VERSION + " but found " + reader.getVersion());
+            }
+            versionChecked = true;
+        }


We will have tools that consume multiple evidence files, and it looks like this would only check the first one. It may be better to leave it up to the implementer to check versions. Maybe add a checkVersion(String version) function to the FeatureOutputCodec interface?

Dealing with codecs is all completely hidden from the user in the engine classes. However, the engine classes do the right thing and create a new codec instance for each FeatureReader. So this will be fine: Each bci file read will have its version checked just once.

mwalker174 · 2021-06-28T19:21:52Z

src/main/java/org/broadinstitute/hellbender/utils/codecs/FeaturesHeader.java

+        this.className = className;
+        this.version = version;


Should check that these aren't null

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

mwalker174 · 2021-06-30T16:29:05Z

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

+        final DataInputStream dis;
+        final FeaturesHeader header;
+        final long dataFilePointer;
+        SVIntervalTree<Long> index;


I think it would be ideal to move this and SVInterval out of the SV packages and into htsjdk or the engine utils.

mwalker174 · 2021-07-08T18:54:50Z

src/main/java/org/broadinstitute/hellbender/tools/sv/LocusDepth.java

+        return contig + "\t" + position + "\t" + (char)refCall + "\t" +
+                depths[0] + "\t" + depths[1] + "\t" + depths[2] + "\t" + depths[3];


Need to update the tool documentation with the new columns

mwalker174 · 2021-07-09T15:27:35Z

src/main/java/org/broadinstitute/hellbender/tools/sv/PrintSVEvidence.java

+    private void dumpHeader( final FeaturesHeader header ) {
+        System.out.println("Dictionary:");
+        for ( final SAMSequenceRecord rec : header.getDictionary().getSequences() ) {
+            System.out.println(rec.getSAMString());
+        }
+        System.out.println();
+        System.out.println("Samples:");
+        for ( final String sampleName : header.getSampleNames() ) {
+            System.out.println(sampleName);
+        }
    }


I should have been more specific about this in my review. Can you make --sequence-dict-dump and --sample-list-dump arguments (or the like) that specify file to write the header info to? This is just to be a little more machine-friendly.

mwalker174

This is great! Looks good to me!

…h GATK features

tedsharpe · 2021-07-13T15:55:17Z

@droazen I think this has addressed your concerns and is ready to merge. Could you take a quick look?

droazen · 2021-07-13T15:57:42Z

@tedsharpe Great, will take a last look now!

droazen

@tedsharpe Re-reviewed the changes to the classes in the engine package. Looks fine overall, but I did have one or two minor comments that will be easy to resolve (and I provided a suggested patch for one of them). Once these are addressed, I think this can be merged if the other reviewers have also approved.

droazen · 2021-07-13T16:22:53Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureManager.java

+        if ( intervals != null ) {
+            dataSource.setIntervalsForTraversal(intervals);
+        }
+        return dataSource.iterator();


As currently written, this method has the side effect of permanently setting the traversal intervals on the underlying data source, which will affect future calls to getFeatureIterator() as well as other methods. I recommend that you get rid of this side effect by clearing the traversal intervals on the data source after you obtain the iterator. Eg.,

if ( intervals != null ) { dataSource.setIntervalsForTraversal(intervals); } final Iterator<T> iter = dataSource.iterator(); dataSource.setIntervalsForTraversal(null); return iter;

Would be less clunky if there were an iterator(List<SimpleInterval>) method we could use here, but there isn't, unfortunately.

I opted to always setIntervalsForTraversal before getting the iterator, and always set them to null afterwards. Fewer assumptions about the state of the source. OK?

droazen · 2021-07-13T16:42:03Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureWalker.java

+            final FeatureInput<F> featureInput = new FeatureInput<>(drivingPath, "drivingFeatureFile");
+            featureInput.setFeatureCodecClass((Class<FeatureCodec<F, ?>>)codec.getClass());
+            features.addToFeatureSources(featureInput,
+                    new FeatureDataSource<>(featureInput, FeatureDataSource.DEFAULT_QUERY_LOOKAHEAD_BASES, null,


The previous implementation passed in 0 here for queryLookaheadBases rather than FeatureDataSource.DEFAULT_QUERY_LOOKAHEAD_BASES, and used FeatureDataSource.DEFAULT_QUERY_LOOKAHEAD_BASES only for drivingFeatures itself (above). This was because we know that drivingFeatures is going to be used for a full linear single-pass traversal, so the caching makes sense, whereas the secondary "drivingFeatureFile" FeatureInput might be used for things like windowed queries around individual loci that would interact poorly with the very simplistic forward-looking caching scheme.

Now, I don't object to turning the caching on in both places if you've done some benchmarking and concluded that doing so will result in better performance for typical FeatureWalkers -- is that in fact the case?

Nah. Good catch. The change was inadvertent.

droazen · 2021-07-13T16:45:43Z

src/main/java/org/broadinstitute/hellbender/utils/io/BlockCompressedIntervalStream.java

+import java.nio.file.Path;
+import java.util.*;
+
+public class BlockCompressedIntervalStream {


Shouldn't BlockCompressedIntervalStream be covered by direct unit tests?

Yes. I thought the indirect tests covered everything, but they didn't. Added test for query feature. (As well as reduplicated the round-trip test.) Thanks for catching this, too.

tedsharpe · 2021-07-14T19:58:40Z

I think maybe we're ready to go this time?

gatk-bot · 2021-07-14T20:43:41Z

Travis reported job failures from build 34955
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
integration	openjdk11	34955.12	logs
integration	openjdk8	34955.2	logs
integration	openjdk11	34955.12	logs

droazen · 2021-07-15T20:25:36Z

@tedsharpe Looks like you have some test failures in FuncotateSegmentsIntegrationTest (see latest gatk-bot comment above for the test logs). Could you resolve these before I re-review?

gatk-bot · 2021-07-16T15:21:09Z

Travis reported job failures from build 34987
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
cloud	openjdk8	34987.1	logs
cloud	openjdk11	34987.14	logs

droazen · 2021-07-16T17:40:31Z

@tedsharpe Some additional failures in VariantWalkerGCSSupportIntegrationTest -- are these related to this branch?

tedsharpe · 2021-07-18T16:58:41Z

Tests all passing, now.

droazen

@tedsharpe Two quick remaining TODOs for you, then we can hit merge

droazen · 2021-07-19T19:31:01Z

src/main/java/org/broadinstitute/hellbender/engine/FeatureWalker.java

+            final FeatureInput<F> featureInput = new FeatureInput<>(drivingPath, "drivingFeatureFile");
+            featureInput.setFeatureCodecClass((Class<FeatureCodec<F, ?>>)codec.getClass());
+            features.addToFeatureSources(featureInput,
+                    new FeatureDataSource<>(featureInput, 0, null,


The third argument to this constructor should be codec.getFeatureType(), not null -- unless you changed it to null deliberately?

droazen · 2021-07-19T19:34:00Z

src/test/java/org/broadinstitute/hellbender/utils/io/BlockCompressIntervalStreamUnitTests.java

+import java.util.ArrayList;
+import java.util.Collections;
+
+public class BlockCompressIntervalStreamUnitTests {


Unit test classes should extend GATKBaseTest, and the class name should end in *UnitTest, not *UnitTests -- I believe this matters when we parallelize the test suite by test type in Travis.

droazen

👍

tedsharpe requested review from mwalker174 and droazen March 15, 2021 19:23

mwalker174 reviewed Mar 24, 2021

View reviewed changes

tedsharpe force-pushed the tws_BlockCompressedIntervalStreams branch from 304d73f to c2ef909 Compare April 29, 2021 18:59

lbergelson reviewed May 12, 2021

View reviewed changes

droazen suggested changes May 14, 2021

View reviewed changes

mwalker174 reviewed Jun 30, 2021

View reviewed changes

mwalker174 reviewed Jul 9, 2021

View reviewed changes

mwalker174 approved these changes Jul 12, 2021

View reviewed changes

binary, indexed, block-compressed streams of intervals integrated wit…

acf83ba

…h GATK features

tedsharpe force-pushed the tws_BlockCompressedIntervalStreams branch from 6782b95 to acf83ba Compare July 13, 2021 15:06

droazen suggested changes Jul 13, 2021

View reviewed changes

address reviewer comments

8f1ea15

tedsharpe added 2 commits July 16, 2021 09:02

oops. stricter about file names now.

2f3f61f

fix Funcotator test

478a0d4

this time for sure, Rocky

2290a74

droazen suggested changes Jul 19, 2021

View reviewed changes

address reviewer comments

457b0c5

droazen approved these changes Jul 20, 2021

View reviewed changes

droazen merged commit c042e8f into master Jul 20, 2021

droazen deleted the tws_BlockCompressedIntervalStreams branch July 20, 2021 16:43

		return contig + "\t" + position + "\t" + (char)refCall + "\t" +
		depths[0] + "\t" + depths[1] + "\t" + depths[2] + "\t" + depths[3];

block compressed interval streams #7142

block compressed interval streams #7142

Conversation

tedsharpe commented Mar 11, 2021

mwalker174 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbergelson commented May 12, 2021

lbergelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe commented May 12, 2021

lbergelson commented May 12, 2021

gatk-bot commented May 13, 2021 • edited Loading

droazen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe commented May 14, 2021

tedsharpe commented May 14, 2021

lbergelson commented May 17, 2021

lbergelson commented May 17, 2021

gatk-bot commented May 18, 2021 • edited Loading

tedsharpe commented May 18, 2021

lbergelson commented May 18, 2021

tedsharpe commented May 18, 2021

tedsharpe commented May 24, 2021

gatk-bot commented May 24, 2021 • edited Loading

gatk-bot commented May 25, 2021 • edited Loading

mwalker174 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwalker174 left a comment

Choose a reason for hiding this comment

tedsharpe commented Jul 13, 2021

droazen commented Jul 13, 2021

droazen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe commented Jul 14, 2021

gatk-bot commented Jul 14, 2021 • edited Loading

droazen commented Jul 15, 2021

gatk-bot commented May 13, 2021 •

edited

Loading

gatk-bot commented May 18, 2021 •

edited

Loading

gatk-bot commented May 24, 2021 •

edited

Loading

gatk-bot commented May 25, 2021 •

edited

Loading

tedsharpe Jul 6, 2021 •

edited

Loading

gatk-bot commented Jul 14, 2021 •

edited

Loading

gatk-bot commented Jul 16, 2021 •

edited

Loading