Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with snvs being removed and not assigned a cluster #30

Open
ses101-24 opened this issue Sep 30, 2024 · 0 comments
Open

error with snvs being removed and not assigned a cluster #30

ses101-24 opened this issue Sep 30, 2024 · 0 comments

Comments

@ses101-24
Copy link

ses101-24 commented Sep 30, 2024

I have run dpclust with pairs of whole genome samples for each patient, comparing the clones between a primary tumour and a metastatic tumour for each patient. The clustering works for all the patients, and I also have clustering plots showing the clonal clusters, but I am having trouble with snvs being filtered out at the beginning that are in one sample but not present in the other at all. Sometimes this is thousands of variants, including variants in some of the driver genes that I am particularly interested in. Then they are not assigned a cluster. It seems to be because no.chrs.bearing.mut is zero in the input file for one of the samples, but then it removes that snv completely for both samples, as the if condition checks both files.

(in from load.data.inner function is "Removed xxx with missing totalCopyNumber").

I followed the instructions in the documentation with input vcf files and copy number segment data, so I have one allDirichletProcessInfo input file for each sample in the pair, with the same loci in each (so both have the same number of lines).

But I would think that is expected behaviour, that some variants are only present in one sample, and they should be input to the clustering algorithm and assigned to a cluster?

There is also some code where it says that these removed snvs are added in afterwards, but I think there is a code error here.

writeStandardFinalOutput, it adds the removed snps back into the output, by calling the function add_removed_snvs.

Add the removed mutations back in
output = cbind(dataset$chromosome[,1], dataset$position[,1]-1, dataset$position[,1], clustering$best.node.assignments, clustering$best.assignment.likelihoods)
output = add_removed_snvs(dataset, output),

But this function appears to have an error when sorting the variants by chromosome position afterwards. The line with match, actually removes the snps just added rather than sorting them.

Sort the output in the same order as the dataset
chrpos_input = paste(dataset$chromosome, dataset$position, sep="")
chrpos_output = paste(snv_assignment_table[,1], snv_assignment_table[,3], sep="
")
snv_assignment_table = snv_assignment_table[match(chrpos_input, chrpos_output),]
return(snv_assignment_table)

In the comment, it also says that it assigns these variants a cluster, but I couldn't find a place in the code that does that.

Please correct me if I have misunderstood any of the above.

Although not part of the above issue, I would be very appreciative if you could explain the reasons for CCF of above 1, and how to determine whether it is necessary to do any filtering either before or afterwards to remove any noise. (I tried both without any filtering of variants beforehand, and also with removing any snvs beforehand that didn't have a VAF of 5% in either sample).

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant