Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for somatic variant calling without normal #503

Merged
merged 13 commits into from
May 6, 2024

Conversation

ericblanc20
Copy link
Contributor

This is a first attempt to support tumor-only samples in the somatic snappy branch.

Many points are still open for discussion/criticism:

  1. The changes are limited to SNVs & indels, I haven't looked at changes required for CNVs (exome nor wgs).
  2. The (theoretical) possibility of joint calling multiple tumor samples from a single donor has been removed. I am not sure that it ever worked.
  3. The extraction type is now mandatory, and samples without one will not be processed. Samples sequenced with long reads may also be rejected.
  4. The tests don't exercise samples without normals, so coverage will decrease.

The proposed overhaul/replacement of biomedsheets should alter how the samples are connected to each others. So points 2 & 3 (and possibly 4 too) may be revisited when we have a plan for the new system.

The CNV steps need complete re-write anyway. First WES & WGS should be unified, then the problem with reference creation in the panel of normals step should be addressed (WGS, excluding hard-to-map regions, connection with exome panel naming, ...). The possibility of CNV calling without normals should be included in this wider discussion.

@ericblanc20 ericblanc20 linked an issue Apr 24, 2024 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Apr 24, 2024

Coverage Status

coverage: 86.002% (+0.2%) from 85.764%
when pulling 3129c26 on 502-somatic-variant-calling-without-normal
into b971088 on main.

Copy link
Contributor

@tedil tedil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, most of the changes are fine. I only have some comments regarding user feedback, configuration validation, wrapper parameters and some code suggestions.

With respect to the tests: it would be very nice if we had a test which covers the no-normal case, because that is the new functionality proposed in this PR.

Copy link
Contributor

@tedil tedil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes left, otherwise: looks good and thanks for even incorporating the wrapper-replace-config-with-params-access changes!

Comment on lines 674 to 675
self._write_step_config()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be removed before merging (or only done on specific log levels?)

@@ -155,6 +155,7 @@
tools_somatic_variant_calling: null # Default: use those defined in somatic_variant_calling step
tools_somatic_variant_annotation: null # Default: use those defined in somatic_variant_annotation step
has_annotation: True
filtration_schema: "sets" # Either "sets" (old scheme- filter_sets) or "list" (new scheme- filter_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "sets" variant is the deprecated one anyways, why not put the "list" one as default?
(Also, why is sets plural?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is changed. sets is because of filter_sets, I didn't want to break the config unnecessarily. But I'll change it if you feel it's important.

Comment on lines 357 to 358
normal_library = self.tumor_to_normal_library.get(wildcards["tumor_library"], None)
if normal_library:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi, since python 3.8 this can be written as

if normal_library := self.tumor_to_normal_library.get(wildcards["tumor_library"], None):
    # […]

using the "walrus" operator :=

msg = "Only one include or exclude expression is allowed in {} filter {} (configuration: {})"
assert len(keywords) == 1, msg.format(self.filter_name, filter_nb, keywords)
keyword = list(keywords.keys())[0]
msg = 'Unknown keyword "{}" in {} filter {} (allowed values: include, exclude, configuration: {})'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as if "configuration" was also an allowed value. Perhaps write allowed values: "include", "exclude". Configuration: {} or something similar instead?

)
assert len(keywords) == 1, msg.format(self.filter_name, filter_nb, keywords)
keyword = list(keywords.keys())[0]
msg = 'Unknown keyword "{}" in {} filter {} (allowed values: include, exclude, configuration: {})'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above + missing "path_bed" as a value that's allowed

parameters = parent(wildcards)
filter_nb = int(wildcards["filter_nb"])
keywords = self.config["filter_list"][filter_nb - 1][self.filter_name]
msg = "Only one protected region is allowed in {} filter {} (configuration: {})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it "one protected region" or "one protected region BED file"?

@@ -546,14 +637,21 @@ def _get_log_file(self, action):
name_pattern = "{mapper}.{var_caller}"
if self.config["has_annotation"]:
name_pattern += ".{annotator}"
name_pattern += ".dkfz_bias_filter.{tumor_library}"
name_pattern += ".dkfz_bias_filter"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name_pattern += f".{self.name}" to avoid issues when someone (for whatever reason) decides to rename the filter name.

Comment on lines 1089 to 1093
name_pattern_1 = name_pattern + (
r".dkfz_bias_filter.eb_filter.{tumor_library}.{filter_set}.{exon_list}"
)
name_pattern_2 = name_pattern + (
r".dkfz_bias_filter.eb_filter.{tumor_library}.{filter_set}.{exon_list}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these different?

… set to list, config output under verbose control
Copy link
Contributor

@tedil tedil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing, then we're good to go

@ericblanc20 ericblanc20 merged commit 66f555c into main May 6, 2024
7 checks passed
@ericblanc20 ericblanc20 deleted the 502-somatic-variant-calling-without-normal branch May 6, 2024 12:44
@tedil tedil mentioned this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

somatic variant calling without normal
3 participants