Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

varfish_export_mehari_bam_qc fails for WGS workflow due to undefined exon bed file #511

Closed
Nicolai-vKuegelgen opened this issue May 28, 2024 · 1 comment · Fixed by #516

Comments

@Nicolai-vKuegelgen
Copy link
Contributor

Describe the bug
The varfish_export_mehari_bam_qc step uses the mehari bam_qc wrapper script, which is simply a link to the varfish annotator bam_qc wrapper. This wrapper needs the samtools idx/bam/flagstats files as well as the alfreq_qc json files as input.

The function that gathers the mehari bam qc input files (see below) contains a check to see if the library name can be mapped to a library kit before it adds the alfreq_qc file to the input. This mapping fails if no bed file for coverage analysis is provided in the ngs_mapping/target_coverage_report/path_target_interval_list_mapping config section. For WGS data this section is usually empty, since coverage does not need to be restricted to specific sections. However this results in the alfred_qc not being used as input in the varfish_export_mehari_bam_qc step and thus causing a crash.

To Reproduce
Steps to reproduce the behavior:

  1. Run latest snappy version with no matching kit defined in step_config/ngs_mapping/target_coverage_report/path_target_interval_list_mapping
  2. Run varfish_export step
  3. See error

Expected behavior
Preferably the mehari bam_qc input file function needs to be adapted to always include the alfred_qc file independently of kit matches.
Alternatively, an way to define a bed file for WGS data (that does not negatively impact coverage calculation) needs to implemented.

Additional context
Mehari bam qc input file function:

    @dictify
    def _get_input_files_bam_qc(self, wildcards):
        ngs_mapping = self.parent.sub_workflows["ngs_mapping"]
        # Get names of primary libraries of the selected pedigree.  The pedigree is selected
        # by the primary DNA NGS library of the index.
        pedigree = self.index_ngs_library_to_pedigree[wildcards.index_ngs_library]
        result = {"bamstats": [], "flagstats": [], "idxstats": [], "alfred_qc": []}
        for donor in pedigree.donors:
            if not donor.dna_ngs_library:
                continue
            tpl = (
                f"output/{wildcards.mapper}.{donor.dna_ngs_library.name}/report/bam_qc/"
                f"{wildcards.mapper}.{donor.dna_ngs_library.name}.bam.%s.txt"
            )
            for key in ("bamstats", "flagstats", "idxstats"):
                result[key].append(ngs_mapping(tpl % key))
            if donor.dna_ngs_library.name not in self.parent.ngs_library_to_kit:
                continue
            path = (
                f"output/{wildcards.mapper}.{donor.dna_ngs_library.name}/report/alfred_qc/"
                f"{wildcards.mapper}.{donor.dna_ngs_library.name}.alfred.json.gz"
            )
            result["alfred_qc"].append(ngs_mapping(path))
        return result
@Nicolai-vKuegelgen
Copy link
Contributor Author

@holtgrewe
Do you think the lines checking that a kit is defined for a given sample can safely be deleted here?
I assume with the switch to alfred_qc there should be no need, since both WES & WGS data should have the same coverage report output files (i.e. alfred_qc json), but I am not 100% sure.

Specifically removing these lines should fix the export step for WGS data:

            if donor.dna_ngs_library.name not in self.parent.ngs_library_to_kit:
                continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant