Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different p-values #37

Open
smhaider opened this issue Jul 15, 2022 · 0 comments
Open

different p-values #37

smhaider opened this issue Jul 15, 2022 · 0 comments

Comments

@smhaider
Copy link

Hi hope you are doing well,

I have tested Raremetal with two scenario's, I have a multi-sample called and qced vcf from 2 datasets. these datasets were merged and multisample called and qced together. I ran gene burden test on this final multisample called vcf using EPACTS and Raremetal. I got similar results from both tools.
then I split the two datasets, and ran firstly Raremetalworker on both datasets separately and then did aggregation test using summary statistics of the two datasets using Raremetal. and with the second approach I lost all my significant hits that I got from EPACTS and Raremetal.

I checked top hits of single.variants.score files from Combined.singlevar.score.txt and compared it with the single variant score files of separate datasets, everything is same AC, callrate, MAFs, but somehow huge difference in the pvalues. in the final burden test. I am just curious why is it so. just for your reference please see below.

result from combined.singlevar.score.txt

7 44579335 G A 60738 0.0057634 0.0057634 700 0.999835 0.144431 60032 692 4 529.206 67.0507 0.117711 2.95912e-15

result from study1.singlevar.score.txt

7 44579335 G A 17817 0.0131409 0.0131409 468 0.999439 1 17342 462 3 89.5217 46.0443 0.0422257 0.0518651

result from study2.singlevar.score.txt

7 44579335 G A 42921 0.00270264 0.00270264 232 1 0.26875 42690 230 1 62.0418 54.1347 0.0211706 0.251769

result from study1.study2.meta.score.txt

7 44579335 G A 60728 0.0057634 ++ 0.0300088 0.0140711 7.49E-05 0.0329528

As you can see for this particular variant allele count, allele freq, N_ref, N_Het, N_Alt are all same as combined.singlevar.score.txt if we add up the columns of study1 and study2 but there is huge difference in the pvlaues. just wondering what could be the reason for that? I have noticed similar behavior with all the top variants from combined analysis and ended up losing all the significant hits. sorry for my long post, I am just curious what is going on here.

looking forward to hearing from you and thanks in advance.

Kind Regards,
Haider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant