- Download the genome fa files.
For example, to download all of mm10 to the current directory from UCSC:
rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/mm10/chromosomes/ ./
- Merge all of the fa files into one large fa file.
Use merge_dir_fa.pl
to merge the files into a single fa file with chromosomes
sorted in a natural order (e.g., 1-19, M, X, and Y) with unmapped chromosome
after chrY.
./merge_dir_fa.pl -d ../mm10_2015-01-25/ -c '1-19,M,X,Y' -o mm10
- Index the merged fa file.
Use index_genome
either interactively redirect input from a file.
For example, index_genome < in.cmds
where in.cmds
contains:
d
1000
mm10.fa
mm10
n
- Single-ended or paired-ended mapping of fastq files is supported.
- Mapping a collection of files is also supported and made easier using
map_directory_array.pl
, which does make some assumptions about the naming of the files and that paired-ended files either have_1_
vs_2_
or_R1_
vs_R2_
.
- place all pileup files into a single directory and use
pecaller
launched from that directory. - The user specified bed file can restrict calling to sites within the file.
The chromosome order must be exactly the same as in the
sdx
file of the indexed genome. - Note: if you experience a segmentation fault just after running the command
it is likely that you have not supplied the command correctly. It is easy to
miss a parameter, and an
argtable3
orgetopt
interface is on our wish list. pecaller
will make a base and snp file for all sites in the user supplied region (or every site covered in the pileup file if one is not provided).
- To merge multiple base and snp files from different
pemapper
runs place all of the called files (i.e., snp, base, and indel files) into the same directory (or symlink them) and runmake_snplist_formerge.pl
, which will examine the snp files and create a "good" and "bad" list of sites. - Provide
pecall_merger
with the base files and sites that should be merged.
pecaller
calls each base indepently, unlike a haplotype caller. The multithreaded nature ofpecaller
also means that small deletions (or SNPs) may not be called contiguously. Also, insertions are stored in a separateindel
file.- To sort the variants and place all final indels into the snpfile use
merge_indel_snp.pl
.
snp_tran_counter.pl
andsnp_tran_silent_rep.pl
give transition to transversion counts for variants obsered. The latter provides counts specific to different kinds of classes of variants (i.e., replacement sites).snp_tran_silent_rep.pl
expects the annotation to come from SeqAnt.
- Change EOL characters to unix
set ff=unix
in vim, for example. - indent things in a consistent way:
indent -bli0 -l120
.