Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Somatic cnv checking #426

Merged
merged 9 commits into from
Aug 22, 2023
Merged

feat: Somatic cnv checking #426

merged 9 commits into from
Aug 22, 2023

Conversation

ericblanc20
Copy link
Contributor

At the moment, implementation of steps somatic_wgs_cnv_calling & somatic_targeted_seq_cnv_calling are based on coverage depth only. From that coverage, the log ratio of tumor to normal samples depth is computed, and after segmentation, the expected number of alleles in the tumor is computed.

To check quality of the log fold change of depths and the allele number call, the somatic_cnv_checking step is organized in the following way:

  1. Heterozygous SNPs with sufficient coverage are found in the normal samples.
  2. At these positions, the coverage (pileup) of the reference & alternative alleles are computed in the tumor sample
  3. The B allele fraction is plotted against the coverage log fold change, and colors by allele number call

Remains to be done:

  1. Parallelize the computation of heterozygous SNVs in the normal (this task could be very resource-intensive for WGS data)
  2. The filtration of locii (to avoid SNPs found in repeats, for example) is poorly tested
  3. Some score should be computed to quantify the internal agreement between coverage and BAF results

@ericblanc20 ericblanc20 requested a review from mbenary July 28, 2023 13:04
@coveralls
Copy link

coveralls commented Jul 28, 2023

Coverage Status

coverage: 85.546% (+0.2%) from 85.38% when pulling 4996d4d on somatic_cnv_checking into 740dda5 on main.

Copy link
Contributor

@mbenary mbenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

excluded_regions: "" # Bed file of regions to be excluded
max_depth: 10000 # Max depth for pileups
min_cov: 20 # Minimum depth for reference and alternative alleles to consider variant
min_baf: 0.4 # Maximum BAF to consider variant as heterozygous (between 0 & 1/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable name and comment not consistent. Please adjust.

y <- read.table(cnv, sep="\t", header=0)
colnames(y) <- c("CHROM", "start", "stop", "name", "LFC", "strand")
y <- y |>
dplyr::mutate(LFC=replace(LFC, strand == "-", LFC[strand=="-"])) |>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changing anything, Add "-" in front of replacement.

Copy link
Contributor

@mbenary mbenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ericblanc20 ericblanc20 merged commit 18217bc into main Aug 22, 2023
@ericblanc20 ericblanc20 deleted the somatic_cnv_checking branch August 22, 2023 06:57
@tedil tedil mentioned this pull request Jun 28, 2024
This was referenced Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants