Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add option to allow mapping to pangenome #274

Merged
merged 64 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
f353a70
add option to allow mapping to pangenome
huzuner Nov 16, 2023
97c098a
fix lambda function for input files
huzuner Nov 17, 2023
8f455cb
fix vg input problem
huzuner Nov 17, 2023
ce395ab
sort vg_mapped bam file
huzuner Nov 17, 2023
d593932
format chr names and index
huzuner Nov 20, 2023
cc11c4a
add read group information
huzuner Nov 20, 2023
d8cb96e
remove nonclassical chromosomes from headers convert mitochrondiral n…
huzuner Dec 6, 2023
45ef707
correct chr names for extractiong of primary chromosomes
huzuner Dec 6, 2023
7ba7c54
extract only properly paired reads
huzuner Dec 28, 2023
4c27790
Merge branch 'master' into allow-pangenome-mapping
huzuner Mar 25, 2024
9251141
Merge branch 'master' of github.com:snakemake-workflows/dna-seq-varlo…
huzuner Apr 2, 2024
5a45b64
modify input-output paths of vg related rules and fix 'and' operator bug
huzuner Apr 2, 2024
5953c14
add index as input to filter_by_annotation rule to avoid index not fo…
huzuner May 8, 2024
af6ecaf
upload example config and fix typo in vembrane expression
huzuner May 8, 2024
0f6b011
provide example event, scenario and sample
huzuner May 8, 2024
d1173ac
Merge branch 'master' into allow-pangenome-mapping
huzuner May 8, 2024
e78e396
fmt
huzuner May 8, 2024
e5dd722
use default workflow config and scenario and add purity value to samp…
huzuner May 28, 2024
4563490
define and use get_aligner func
huzuner May 29, 2024
083dd58
commit suggestion to use partial
huzuner Jun 3, 2024
f101344
renamed pangenome keyword and link to download was added to config
huzuner Jun 7, 2024
57c66f5
placed pangenome download path to common.smk
huzuner Jun 7, 2024
8042ff7
Merge branch 'master' into allow-pangenome-mapping
huzuner Jun 7, 2024
41b926f
fmt
huzuner Jun 7, 2024
4e54138
Merge branch 'master' into allow-pangenome-mapping
huzuner Jun 7, 2024
e8ed359
Merge branch 'master' into allow-pangenome-mapping
huzuner Jun 21, 2024
e918792
Update config/config.yaml
huzuner Oct 9, 2024
badb200
Merge branch 'master' into allow-pangenome-mapping
huzuner Oct 9, 2024
6072f28
refactor: add version for vg
huzuner Oct 10, 2024
5825209
fix: add sample and read group information during vg giraffe mapping
huzuner Oct 10, 2024
316332b
fix: fix vg shell command for the input in case of single end reads
huzuner Oct 11, 2024
d2f88fa
chore: remove unnecessary format method in f-string
huzuner Oct 11, 2024
5fdaaca
chore: fmt
huzuner Oct 11, 2024
e9ef436
Merge branch 'master' into allow-pangenome-mapping
huzuner Oct 15, 2024
e22a33c
fix: fix the link to the pangenome index
huzuner Oct 15, 2024
206a170
chore: do not hardcode pangenome index path in common.smk
huzuner Oct 15, 2024
31309e0
fix vg ref output and fusion support
FelixMoelder Oct 15, 2024
a5f77c8
fix: include pangenome entry in the config of tests
huzuner Oct 15, 2024
793aa9f
fix: include pangenome entry in the config of tests
huzuner Oct 15, 2024
82b0d82
fix: include pangenome entry in the config of tests
huzuner Oct 15, 2024
5681854
fix: include pangenome entry in the config of tests
huzuner Oct 15, 2024
0834cdb
add empty index, fix file ext
FelixMoelder Oct 15, 2024
95f4976
fix umi annotation, fmt
FelixMoelder Oct 16, 2024
031639e
fix assign_primers
FelixMoelder Oct 17, 2024
e5b7037
fix readgroup
FelixMoelder Oct 17, 2024
d699c4b
cleanup
FelixMoelder Oct 17, 2024
1d679f6
refactoring, vg preprocessing steps
FelixMoelder Oct 18, 2024
d009717
Merge branch 'master' into allow-pangenome-mapping
FelixMoelder Oct 18, 2024
07e6d60
create pangenome indexes, simplifications, cleanup
FelixMoelder Nov 12, 2024
e47c9c3
update test configs
FelixMoelder Nov 12, 2024
a36ed84
Merge branch 'master' into allow-pangenome-mapping
FelixMoelder Nov 12, 2024
971d747
typo
FelixMoelder Nov 12, 2024
1167fc8
Merge branch 'master' into allow-pangenome-mapping
FelixMoelder Nov 28, 2024
632e136
adjust input and versions
FelixMoelder Nov 28, 2024
3578af1
remove trailing spaces
FelixMoelder Nov 28, 2024
9e3e718
update config
FelixMoelder Nov 28, 2024
864e50f
cleanup
FelixMoelder Nov 28, 2024
3fc3448
Merge branch 'master' into allow-pangenome-mapping
FelixMoelder Dec 3, 2024
9ff8128
Update varlociraptor.yaml
FelixMoelder Dec 4, 2024
4c3c9d7
update env, temp output, memory usage
FelixMoelder Dec 4, 2024
322811f
rename haplotypes by python expr
FelixMoelder Dec 4, 2024
b1da7de
fix yaml typo
FelixMoelder Dec 4, 2024
7c2c80f
fix minor bug
FelixMoelder Dec 4, 2024
235fc63
Update workflow/rules/ref.smk
johanneskoester Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .test/config-chm-eval/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ ref:
release: 100
# Genome build
build: GRCh38
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down Expand Up @@ -181,4 +184,4 @@ report:
max_read_depth: 250
stratify:
activate: false
by-column: condition
by-column: condition
5 changes: 4 additions & 1 deletion .test/config-giab/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ ref:
# Genome build
build: GRCh38
chromosome: 1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down Expand Up @@ -154,4 +157,4 @@ params:
min_alternate_fraction: 0.05 # Reduce for calling variants with lower VAFs

gene_coverage:
min_avg_coverage: 5
min_avg_coverage: 5
5 changes: 4 additions & 1 deletion .test/config-no-candidate-filtering/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ ref:
release: 100
# Genome build
build: R64-1-1

pangenome:
activate: false
vcf: ""

primers:
trimming:
activate: false
Expand Down
3 changes: 3 additions & 0 deletions .test/config-simple/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ ref:
release: 100
# Genome build
build: R64-1-1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down
3 changes: 3 additions & 0 deletions .test/config-sra/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ ref:
release: 110
# Genome build
build: R64-1-1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down
3 changes: 3 additions & 0 deletions .test/config-target-regions/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ ref:
release: 100
# Genome build
build: R64-1-1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down
3 changes: 3 additions & 0 deletions .test/config-target-regions/config_multiple_beds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ ref:
release: 100
# Genome build
build: R64-1-1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down
3 changes: 3 additions & 0 deletions .test/config_primers/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ ref:
snpeff_release: 86
# Genome build
build: R64-1-1
pangenome:
activate: false
vcf: ""

primers:
trimming:
Expand Down
10 changes: 10 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ ref:
release: 111
# Genome build
build: GRCh38
pangenome:
# if active, reads will be aligned to given pangenome instead of to the linear reference genome
# Important: this is only supported for homo_sapiens so far
activate: True
FelixMoelder marked this conversation as resolved.
Show resolved Hide resolved
# URL to pangenome haplotypes (vcf-file)
# Graph resources v1.1: https://github.com/human-pangenomics/hpp_pangenome_resources/
# Graph resources v1.0: https://github.com/human-pangenomics/hpp_pangenome_resources/blob/main/hprc-v1.0-mc.md
# Important: ensure that the haplotype vcf is built against the same reference genome as specified above under build
vcf: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.raw.vcf.gz
# Optionally, instead of downloading the whole reference from Ensembl via the
# parameters above, specify a specific chromosome below and uncomment the line.
# This is usually only relevant for testing.
Expand Down Expand Up @@ -157,6 +166,7 @@ calling:
# Add varlociraptor events to aggregated over.
# The probability for the union of these events is used for controlling
# the FDR.
- present
- somatic_tumor_high
- somatic_tumor_medium
filter: # myfilter
Expand Down
1 change: 1 addition & 0 deletions config/samples.tsv
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sample_name alias group platform purity panel umi_read umi_read_structure datatype calling
SRR702070 tumor SRR702070_group ILLUMINA 1.0 dna variants
2 changes: 1 addition & 1 deletion config/scenario.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ events:
somatic_tumor_high: "tumor:[0.3,1.0] & normal:0.0"
somatic_normal: "normal:]0.0,0.5["
germline_hom: "normal:1.0"
germline_het: "normal:0.5"
germline_het: "normal:0.5"
1 change: 1 addition & 0 deletions config/units.tsv
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sample_name unit_name fq1 fq2 sra adapters
SRR702070 lane1 /projects/koesterlab/orthanq/HapMap_data/SRR702070_1.fastq.gz /projects/koesterlab/orthanq/HapMap_data/SRR702070_2.fastq.gz
2 changes: 1 addition & 1 deletion workflow/envs/varlociraptor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ channels:
- bioconda
- nodefaults
dependencies:
- varlociraptor >=8.4.11,<8.5
- varlociraptor >=8.4.12,<8.5
- vega-lite-cli =5.16
- bcftools =1.19
5 changes: 5 additions & 0 deletions workflow/envs/vg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: vg
channels:
- bioconda
dependencies:
- vg =1.60
3 changes: 2 additions & 1 deletion workflow/rules/benchmarking.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ruleorder: chm_eval_sample > map_reads
# TODO Is this ruleorder of any use?!
ruleorder: chm_eval_sample > map_reads_bwa


rule gather_benchmark_calls:
Expand Down
88 changes: 58 additions & 30 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ genome_prefix = f"resources/{genome_name}"
genome = f"{genome_prefix}.fasta"
genome_fai = f"{genome}.fai"
genome_dict = f"{genome_prefix}.dict"
pangenome_name = f"pangenome.{species}.{build}"
pangenome_prefix = f"resources/{pangenome_name}"

# cram variables
use_cram = config.get("use_cram", False)
Expand Down Expand Up @@ -260,20 +262,13 @@ def get_control_fdr_input(wildcards):
return "results/final-calls/{group}.{calling_type}.annotated.bcf"


def get_recalibrate_quality_input(wildcards, bai=False):
ext = "bai" if bai else "bam"
datatype = get_sample_datatype(wildcards.sample)
if datatype == "rna":
return "results/split/{{sample}}.{ext}".format(ext=ext)
# Post-processing of DNA samples
if is_activated("calc_consensus_reads"):
return "results/consensus/{{sample}}.{ext}".format(ext=ext)
elif is_activated("primers/trimming"):
return "results/trimmed/{{sample}}.trimmed.{ext}".format(ext=ext)
elif is_activated("remove_duplicates"):
return "results/dedup/{{sample}}.{ext}".format(ext=ext)
def get_aligner(wildcards):
if get_sample_datatype(wildcards.sample) == "rna":
return "star"
elif is_activated("ref/pangenome"):
return "vg"
else:
return "results/mapped/bwa/{{sample}}.{ext}".format(ext=ext)
return "bwa"


def get_cutadapt_input(wildcards):
Expand Down Expand Up @@ -428,31 +423,46 @@ def get_sample_datatype(sample):


def get_markduplicates_input(wildcards):
aligner = "star" if get_sample_datatype(wildcards.sample) == "rna" else "bwa"
aligner = get_aligner(wildcards)
if sample_has_umis(wildcards.sample):
# Special case for vg as umi annotation (if active) is done before finalizing bam output
# Could also directly go to else-branch if aligner != "vg"
return "results/mapped/{aligner}/{{sample}}.annotated.bam".format(
aligner=aligner
)
else:
return "results/mapped/{aligner}/{{sample}}.bam".format(aligner=aligner)


def get_consensus_input(wildcards):
def get_recalibrate_quality_input(wildcards, bai=False):
ext = "bai" if bai else "bam"
datatype = get_sample_datatype(wildcards.sample)
if datatype == "rna":
return "results/split/{{sample}}.{ext}".format(ext=ext)
# Post-processing of DNA samples
if is_activated("calc_consensus_reads"):
return "results/consensus/{{sample}}.{ext}".format(ext=ext)
else:
return get_consensus_input(wildcards, bai)


def get_consensus_input(wildcards, bai=False):
ext = "bai" if bai else "bam"
if is_activated("primers/trimming"):
return "results/trimmed/{sample}.trimmed.bam"
elif is_activated("remove_duplicates"):
return "results/dedup/{sample}.bam"
return "results/trimmed/{{sample}}.trimmed.{ext}".format(ext=ext)
else:
aligner = "star" if get_sample_datatype(wildcards.sample) == "rna" else "bwa"
return "results/mapped/{aligner}/{{sample}}.bam".format(aligner=aligner)
return get_trimming_input(wildcards, bai)


def get_trimming_input(wildcards):
def get_trimming_input(wildcards, bai=False):
ext = "bai" if bai else "bam"
if is_activated("remove_duplicates"):
return "results/dedup/{sample}.bam"
return "results/dedup/{{sample}}.{ext}".format(ext=ext)
else:
aligner = "star" if get_sample_datatype(wildcards.sample) == "rna" else "bwa"
return "results/mapped/{aligner}/{{sample}}.bam".format(aligner=aligner)
aligner = get_aligner(wildcards)
return "results/mapped/{aligner}/{{sample}}.{ext}".format(
aligner=aligner, ext=ext
)


def get_primer_bed(wc):
Expand Down Expand Up @@ -623,6 +633,13 @@ def get_read_group(wildcards):
)


def get_vg_read_group(wildcards):
platform = extract_unique_sample_column_value(wildcards.sample, "platform")
return r"--RGLB lib1 --RGPL {platform} --RGPU {sample} --RGSM {sample} --RGID {sample}".format(
sample=wildcards.sample, platform=platform
)


def get_map_reads_sorting_params(wildcards, ordering=False):
match (sample_has_umis(wildcards.sample), ordering):
case (True, True):
Expand All @@ -635,6 +652,13 @@ def get_map_reads_sorting_params(wildcards, ordering=False):
return "samtools"


def get_add_readgroup_input(wildcards):
if sample_has_umis(wildcards.sample):
return "results/mapped/vg/{sample}.annotated.bam"
else:
return "results/mapped/vg/{sample}.mate_fixed.bam"


def get_mutational_burden_targets():
mutational_burden_targets = []
if is_activated("mutational_burden"):
Expand Down Expand Up @@ -685,15 +709,19 @@ def get_selected_annotations():
return selection


def get_annotated_bcf(wildcards):
def get_annotated_bcf(wildcards, index=False):
ext = ".csi" if index else ""
selection = (
get_selected_annotations() if wildcards.calling_type == "variants" else ""
)
return "results/calls/{group}.{calling_type}.{scatteritem}{selection}.bcf".format(
group=wildcards.group,
calling_type=wildcards.calling_type,
selection=selection,
scatteritem=wildcards.scatteritem,
return (
"results/calls/{group}.{calling_type}.{scatteritem}{selection}.bcf{ext}".format(
group=wildcards.group,
calling_type=wildcards.calling_type,
selection=selection,
scatteritem=wildcards.scatteritem,
ext=ext,
)
)


Expand Down
1 change: 1 addition & 0 deletions workflow/rules/filtering.smk
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ rule filter_candidates_by_annotation:
rule filter_by_annotation:
input:
bcf=get_annotated_bcf,
csi=partial(get_annotated_bcf, index=True),
aux=get_annotation_filter_aux_files,
output:
"results/calls/{group}.{event}.{calling_type}.{scatteritem}.filtered_ann.bcf",
Expand Down
4 changes: 2 additions & 2 deletions workflow/rules/maf.smk
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ rule group_vcf_to_maf:
dpath="maf/primary_alias", within=config, default="tumor"
),
vcf_control_alias_option=(
f'--vcf-normal-id {lookup(dpath= "maf/control_alias", within= config, default= "")}'
f'--vcf-normal-id {lookup(dpath = "maf/control_alias", within = config , default = "")}'
if lookup(dpath="maf/control_alias", within=config, default=False)
else ""
),
normal_id=lambda wc: (
f'--normal-id {wc.group}_{lookup(dpath= "maf/control_alias", within= config, default= "")}'
f'--normal-id {wc.group}_{lookup(dpath = "maf/control_alias", within = config , default = "")}'
if lookup(dpath="maf/control_alias", within=config, default=False)
else ""
),
Expand Down
Loading