Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update tests and javadoc.
Browse files Browse the repository at this point in the history
cmnbroad committed Aug 20, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent ad48eed commit 0184bb5
Showing 2 changed files with 137 additions and 56 deletions.
112 changes: 90 additions & 22 deletions src/main/java/org/broadinstitute/hellbender/tools/CreateBundle.java
Original file line number Diff line number Diff line change
@@ -3,7 +3,6 @@
import htsjdk.beta.io.bundle.*;
import htsjdk.beta.plugin.registry.HaploidReferenceResolver;
import htsjdk.beta.plugin.variants.VariantsBundle;
import htsjdk.io.HtsPath;
import htsjdk.samtools.util.FileExtensions;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
@@ -23,44 +22,113 @@
/**
* Create a bundle (JSON) file for use with a GATK tool.
*
* Since most bundles will contain a primary resource plus at least one secondary resource (typically an index),
* Since most bundles need to contain a primary resource plus at least one secondary resource (typically an index),
* the tool will attempt to infer standard secondary resources(s) for a given primary resource if no secondary resource
* is explicitly provided on the command line. Inferred secondary resources are automatically added to the resulting
* bundle. Secondary resource inference can be suppressed by using the --suppress-resource-resolution argument.
*
* Each resource in a bundle must have an associated content type tag. Content types for each resource are either
* specified on the command line via argument tags, or inferred by the tool. For the primary and secondary resources,
* when no content type argument tag is provided, the tool will attempt to infer the content type from the file
* extension. However, the content type for "other" resources (resources that are nether primary nor secondary resources)
* are NEVER inferred, and must always include a content type argument tag.
* extension. However, the content type for "other" resources (resources that are nether primary nor secondary
* resources) are NEVER inferred, and must always include a content type argument tag.
*
* Bundle output file names must end with the suffix ".json".
*
* Common examples:
* In general, content types can be any string, but there are well known content types that must be used when creating
* bundles for tools that expect well known resources types, such as a VCF, a VCF index, a .fasta file, or a reference
* dictionary file. The common well known content types are:
*
* - "CT_VARIANT_CONTEXTS": a VCF file
* - "CT_VARIANTS_INDEX: VCF" index file
*
* - "CT_HAPLOID_REFERENCE": fasta reference file
* - "CT_HAPLOID_REFERENCE_INDEX": fasta index file
* - "CT_HAPLOID_REFERENCE_DICTIONARY": fasta dictionary file
*
* Common bundle creation examples:
*
* VCF Bundles:
*
* 1) Create a resource bundle for a VCF. Let the tool determine the content types, and resolve the secondary resources
* (which for vcfs is the companion index) automatically by finding a sibling index file. If the sibling file cannot
* be found, an exception wil lbe thrown:
* 1) Create a resource bundle for a VCF from just the VCF, letting the tool resolve the secondary (index) resource by
* automatically finding the sibling index file, and letting the tool determine the content types. If the sibling index
* file cannot be found, an exception will be thrown. Resulting bundle contains the VCF and associated index.
*
* CreateBundle \
* --primary path/to/my.vcf \
* --output mybundle.json
*
* The exact same bundle could be created manually by specifying both the resources and the content types explicitly:
*
* CreateBundle \
* --primary:CT_VARIANT_CONTEXTS path/to/my.vcf \
* --secondary:CT_VARIANTS_INDEX path/to/my.vcf.idx \
* --output mybundle.json
*
* 2) Create a resource bundle for a VCF from just the VCF, but suppress automatic resolution of the secondary
* resources. Let the tool determine the content types. The resulting bundle will contain only the vcf resource:
*
* CreateBundle \
* --primary path/to/my.vcf \
* --suppress-resource-resolution \
* --output mybundle.json
*
* 3) Create a resource bundle for a VCF, but specify the VCF AND the secondary index resource explicitly (which
* suppresses automatic secondary resolution). This is useful when the VCF and index are not in the same directory.
* Let the tool determine the content types. The resulting bundle will contain the VCF and index resources:
*
* CreateBundle \
* --primary path/to/my.vcf \
* --secondary some/other/path/to/vcd.idx \
* --output mybundle.json
*
* 4) Create a resource bundle for a VCF, but specify the VCF AND the secondary index resource explicitly (this
* is useful when the VCF and index are not in the same directory), and specify the content types explicitly via
* command line argument tags. The resulting bundle will contain the VCF and index resources.
*
* CreateBundle \
* --primary:CT_VARIANT_CONTEXTS path/to/my.vcf \
* --secondary:CT_VARIANTS_INDEX some/other/path/to/vcd.idx \
* --output mybundle.json
*
* Reference bundles:
*
* 1) Create a resource bundle for a reference from just the .fasta, letting the tool resolve the secondary
* (index and dictionary) resource by automatically finding the sibling files, and determining the content types.
* If the sibling index file cannot be found, an exception will be thrown. The resulting bundle will contain the
* reference, index, and dictionary.
*
* CreateBundle --primary path/to/my.vcf --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --output mybundle.json
*
* 2) Create a resource bundle for a VCF. Let the tool determine the content types, but suppress resolution of the secondary
* resources (which for vcfs is the companion index). The resulting bundle will contain only the vcf resource:
* 2) Create a resource bundle for a reference from just the .fasta, but suppress resolution of the secondary index and
* dictionary resources). Let the tool determine the content type. The resulting bundle will contain only the .fasta
* resource:
*
* CreateBundle --primary path/to/my.vcf --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --suppress-resource-resolution \
* --output mybundle.json
*
* 3) Create a resource bundle for a VCF. Let the tool determine the content type, but specify the secondary
* index resource explicitly (which suppresses secondary resolution). The resulting bundle will contain the vcf
* and index resources:
* 3) Create a resource bundle for a fasta, but specify the fasta AND the secondary index and dictionary resources
* explicitly (which suppresses automatic secondary resolution). Let the tool determine the content types. The
* resulting bundle will contain the fasta, index and dictionary resources:
*
* CreateBundle --primary path/to/my.vcf --secondary some/other/path/to/vcd.idx --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --secondary some/other/path/to/my.fai \
* --secondary some/other/path/to/my.dict \
* --output mybundle.json
*
* Reference bundles: create a bundle using explicitly provided values and content types for the primary and
* secondary resources:
* 4) Create a resource bundle for a fasta, but specify the fasta, index and dictionary resources and the content
* types explicitly. The resulting bundle will contain the fasta, index and dictionary resources:
*
* CreateBundle --primary: path/to/my.fa
* CreateBundle \
* --primary:CT_HAPLOID_REFERENCE path/to/my.fasta \
* --secondary:CT_HAPLOID_REFERENCE_INDEX some/other/path/to/my.fai \
* --secondary:CT_HAPLOID_REFERENCE_DICTIONARY some/other/path/to/my.dict \
* --output mybundle.json
*/
@DocumentedFeature
@CommandLineProgramProperties(
@@ -110,7 +178,7 @@ public class CreateBundle extends CommandLineProgram {
private enum BundleType {
VCF,
REFERENCE,
OTHER
CUSTOM
}
private BundleType outputBundleType;

@@ -129,7 +197,7 @@ protected Object doWork() {
final Bundle bundle = switch (outputBundleType) {
case VCF -> createVCFBundle();
case REFERENCE -> createHaploidReferenceBundle();
case OTHER -> createOtherBundle();
case CUSTOM -> createOtherBundle();
};
writer.write(BundleJSON.toJSON(bundle));
} catch (final IOException e) {
@@ -153,7 +221,7 @@ private BundleType determinePrimaryContentType() {
logger.info(String.format("Primary input content type %s for %s not recognized. A bundle will be created using content types from the provided argument tags.",
primaryContentTag,
primaryResource));
bundleType = BundleType.OTHER;
bundleType = BundleType.CUSTOM;
}
} else {
logger.info(String.format("A content type for the primary input was not provided. Attempting to infer the content type from the %s extension.", primaryResource));
Loading

0 comments on commit 0184bb5

Please sign in to comment.