Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cutoff for duplications #53

Open
robertzeibich opened this issue Sep 1, 2023 · 6 comments
Open

Cutoff for duplications #53

robertzeibich opened this issue Sep 1, 2023 · 6 comments

Comments

@robertzeibich
Copy link

robertzeibich commented Sep 1, 2023

The cutoff for deletions is DHFFC 0.7. What is the recommended DHBFC cutoff for duplications?

@brentp
Copy link
Owner

brentp commented Sep 1, 2023

Hi, duplications are harder, but 1.3 is a reasonable start.

@Qijie0615
Copy link

The detection of duplications is harder. I’m unsure if I can use DHFFC>1.3 or DHBFC>1.3. After the population genotyping and Duphold, I found that some duplications(0/1,1/1) have DHBFC<1.3, but DHFFC>1.3 in the 30x WGS data, and the Samplot results confirm it to be true. Could you give me some advice?

@brentp
Copy link
Owner

brentp commented Jan 17, 2024

As you find, it's hard to come up with a good cutoff for duplications.
The 1.2 cutoff might work in many cases, but would miss when there is already a large cassette that adds a single copy in a tandem dup. You'll have to experiment with what works.

@Qijie0615
Copy link

Thanks for the quick reply.

  1. I want to know that 1.2 means DHBFC >1.2.
  2. I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ?
  3. I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

@brentp
Copy link
Owner

brentp commented Jan 17, 2024

Thanks for the quick reply.

  1. I want to know that 1.2 means DHBFC >1.2.

Yes, you could try this.

  1. I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ?

I mean if you have a tandem duplication with 10 copies and then you add another single copy, you only expect a 10% increase in depth.

  1. I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

It's worth trying, but you'll have to evaluate for yourself how effective it is. If you have trios, you can look at mendelian violations and transmissions. Otherwise, you can look at samplots of variants that are filtered

@Qijie0615
Copy link

Thank you for your quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants