Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reference bundles. #1713

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Support reference bundles. #1713

wants to merge 1 commit into from

Conversation

cmnbroad
Copy link
Collaborator

@cmnbroad cmnbroad commented Aug 6, 2024

The main feature implemented here is that a bundle can be used in place of a reference, without executing any code paths that assume that the individual resources are located in the same parent directory (there are a LOT of such code paths, since most of the existing code base makes these assumptions). There are a number of tests added that exist purely to catch any regressions where such assumptions are reintroduced, at lease in the code paths where bundles are enabled.

@cmnbroad cmnbroad marked this pull request as ready for review September 3, 2024 19:50
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad This looks good to me. I have a few style questions and typos, feel free to do what you want with those though.

@@ -110,4 +117,47 @@ public HaploidReferenceDecoder getHaploidReferenceDecoder(
return (HaploidReferenceDecoder) resolveForDecoding(inputBundle).getDecoder(inputBundle, HaploidReferenceDecoderOptions);
}

/**
* Create q reference bundle given only a fasta path, including an index and a dictionary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: q

@@ -110,4 +117,47 @@ public HaploidReferenceDecoder getHaploidReferenceDecoder(
return (HaploidReferenceDecoder) resolveForDecoding(inputBundle).getDecoder(inputBundle, HaploidReferenceDecoderOptions);
}

/**
* Create q reference bundle given only a fasta path, including an index and a dictionary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Create q reference bundle given only a fasta path, including an index and a dictionary
* Create a reference bundle given only a fasta path, including an index and a dictionary

@@ -64,6 +65,21 @@ public FastaSequenceFile(final Path path, final boolean truncateNamesAtWhitespac
this.in = new FastLineReader(IOUtil.openFileForReading(path));
}

/**
* Constructs a FastaSequenceFile that reads from the specified fasta and dictionary file. Makes no
* assumptions that the fata and dict file are in the same directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* assumptions that the fata and dict file are in the same directory.
* assumptions that the fasta and dict file are in the same directory.

* Open the given indexed fasta sequence file. Throw an exception if the file cannot be opened.
*
* @param path The file to open.
* @param dictPath the dictionar path (may be null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param dictPath the dictionar path (may be null)
* @param dictPath the dictionary path (may be null)

if (IOUtil.isBlockCompressed(path.toPath(), true)) {
throw new SAMException("Indexed block-compressed FASTA file cannot be handled: " + path);
}
this.channel = Files.newByteChannel(path.toPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add getChannel methods to IOPath. We have getStream ones.

if (!truncateNamesAtWhitespace) {
throw new RuntimeException("preferIndexed option requires truncateNamesAtWhitespace");
}
final Optional<BundleResource> indexPathResource = referenceBundle.get(BundleResourceType.CT_REFERENCE_INDEX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These chains of optional if-present-get are kind of gross. What do you think of flatmap chains instead? I think they're a little more readable, at least once you know what you're looking at.

    IOPath indexPath = referenceBundle.get(BundleResourceType.CT_REFERENCE_INDEX)
             .flatMap(BundleResource::getIOPath)
            .orElse(null);

    if (preferIndexed && indexPath != null !Files.exists(indexPath.toPath())) {
        throw new RuntimeException(String.format("FASTA index file %s does not exist", indexPath));
    }

final IOPath remoteIndex = new HtsPath(
Files.copy(
indexFile.toPath(),
nioFastaDir.resolve(indexFile.getBaseName().get() + indexFile.getExtension().get())).toUri().toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant indexDir here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These copy blocks seem like they could be extracted into a function, it might help avoid this copy paste sorts of errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants