Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baf for only one (the last in the list) chromosome is made. #13

Open
chatterjee89 opened this issue Jul 2, 2021 · 1 comment
Open

Comments

@chatterjee89
Copy link

Hi,

Thanks for making this wonderful tool! I have run into an issue:

After running the following command...
samtools view ./WT_96_2.bam | BAFExtract -generate_compressed_pileup_per_SAM stdin ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 50 0; BAFExtract -get_SNVs_per_pileup ./CaSpER/dm6.list ./CaSpER/baf/wt96_2 ./CaSpER/dm6_fasta_pileup/ 20 4 0.1 ./CaSpER/baf/wt96_2/WT_96_2.baf

...I get the following std output:

Generating pileup from SAM file with min mapp qual 50, min base qual 0
Dumping the pileups per SAM file stdin
Allocating pileup memory.
Done.
Maximum # of alleles at each position: 65536
Phred+33 encoding @ 14. read (1):
CACTCAATTATATACTTTATATGGTCGGAAAAGCTTCCTTCTGCCTGTAA
D@@@@EHHEEHEHHEHHGGIH1G1D1FHHHGHGC@<C@HHHHECHHCE@H
Finished reading.00. read
27379256 0 signal, 4037446 4-bit, 636538 8-bit, 26073 12-bit, 18 14-bit, 0 16-bit positions.
Skipping loading check.
24265304 0 signal, 3313416 4-bit, 517137 8-bit, 14343 12-bit, 27 14-bit, 0 16-bit positions.
Skipping loading check.
21308450 0 signal, 3421544 4-bit, 532190 8-bit, 24291 12-bit, 461 14-bit, 0 16-bit positions.
Skipping loading check.
19142111 0 signal, 3866743 4-bit, 516616 8-bit, 16734 12-bit, 67 14-bit, 0 16-bit positions.
Skipping loading check.
20211830 0 signal, 2818271 4-bit, 463435 8-bit, 20107 12-bit, 69 14-bit, 0 16-bit positions.
Skipping loading check.
3663413 0 signal, 3939 4-bit, 0 8-bit, 0 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
1060875 0 signal, 256237 4-bit, 28894 8-bit, 2125 12-bit, 0 14-bit, 0 16-bit positions.
Skipping loading check.
Calling SNVs with minimum 20 total and 4 alternate read coverage and 0.100 min alternate frequency.
Procesing 3R
Loading pileup of length 32079331
Loaded 32079331 long pileup.
Procesing 3L
Loading pileup of length 28110227
Loaded 28110227 long pileup.
Procesing 2R
Loading pileup of length 25286936
Loaded 25286936 long pileup.
Procesing X
Loading pileup of length 23542271
Loaded 23542271 long pileup.
Procesing 2L
Loading pileup of length 23513712
Loaded 23513712 long pileup.
Procesing Y
Loading pileup of length 3667352
Loaded 3667352 long pileup.
Procesing 4
Loading pileup of length 1348131
Loaded 1348131 long pileup.

Question 1: what does the 20 4 0.1 parameter for BAFExtract -get_SNVs_per_pileup mean?

It seems to have worked and I can see the .baf file along with the allele_counts.bin files for each chromosome in the output folder. This is the output:

$WT_96_2$WT_96_2.baf
      chr position alt ref coverage       baf        dev
21231   4    66150  15   8       23 0.6521739 0.15217391
21232   4   116697 142   0      142 1.0000000 0.50000000
21233   4   116733  87  53      140 0.6214286 0.12142857
21234   4   117428  55  24       79 0.6962025 0.19620253
21235   4   117647  27   0       27 1.0000000 0.50000000
21236   4   138445  16  13       29 0.5517241 0.05172414
21237   4   177684  42  14       56 0.7500000 0.25000000
21238   4   308618  26  12       38 0.6842105 0.18421053
21239   4   310183 111  86      197 0.5634518 0.06345178
21240   4   310405  23   2       25 0.9200000 0.42000000
21241   4   313589  21  18       39 0.5384615 0.03846154
21242   4   331005  19   3       22 0.8636364 0.36363636
21243   4   331333  18   8       26 0.6923077 0.19230769
21244   4   349108  16   9       25 0.6400000 0.14000000
21245   4   372148  20   9       29 0.6896552 0.18965517
21246   4   372423  42  15       57 0.7368421 0.23684211
21247   4   372620  13   8       21 0.6190476 0.11904762
21248   4   408746  56   0       56 1.0000000 0.50000000
21249   4   546394  14   8       22 0.6363636 0.13636364
21250   4   611349  22   0       22 1.0000000 0.50000000
21251   4   691913  33   0       33 1.0000000 0.50000000
21252   4   918729  58   0       58 1.0000000 0.50000000
21253   4   951893  23   0       23 1.0000000 0.50000000
21254   4   952037  33   2       35 0.9428571 0.44285714
21255   4   952409  31   0       31 1.0000000 0.50000000
21256   4   962628  23   0       23 1.0000000 0.50000000
21257   4  1016473  26   0       26 1.0000000 0.50000000
21258   4  1016485  24   0       24 1.0000000 0.50000000
21259   4  1028923   5  41       46 0.1086957 0.39130435
21260   4  1029043  26   0       26 1.0000000 0.50000000
21261   4  1115591  94   0       94 1.0000000 0.50000000
21262   4  1118484  33   0       33 1.0000000 0.50000000

HOWEVER, given that dm6 (fruit-flies) has 2L, 2R, 3L, 3R, 4, X, Y chromosomes, and that the generated baf file only ends up containing the values for chromosome 4, I don't quite understand what happened there.
So, Question 2: How can I fix this?

This is for every sample run for the overall experiment. Any help would be greatly appreciated!

Regards,
Deep

@chatterjee89
Copy link
Author

@akdess I think I may have figured out both the problems (first one is explained within the source code for main.cpp). But I need a solution for the second one.

I think BAFExtract is only pulling/writing for chromosome 4 because it is the only integer in the lot - fly chromosomes are written as 2L, 2R, 3L, 3R, X, Y and 4 and so there's an issue with detection or with writing somewhere. Am I wrong to assume so? Could you please help me fix this issue?

Regards,
Deep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant