-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
41 lines (35 loc) · 2.79 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
This folder contains scripts and example data for running polyGembler.
These scripts contains evironmental settings and requirements which
might be different from you system. Please check the script and edit
it as needed before running it.
A list of files in this folder:
gembler_daemon.sh # a daemon script to run the whole polyGembler pipeline with Slurm
gembler_wrapper.sh # a wrapper script to run polyGembler subprograms
runGBSv2.sh # an example script to run the Tassel 5 GBS pipeline
simulateF1GBS.sh # an example script to run polyGembler for F1 GBS data simulation
SNPfiltering.sh # an example script to call SNPfiltering.R
SNPfiltering.R # SNP filtering by testing segregation ratios
# it takes input a VCF file contains allele depth information
# and calls genotypes using the R package 'updog' with f1 model
# and finally runs multinomial exact tests as implemented in
# R package 'XNomial' for testing segregation ratios
data/ref.fa.gz # a refrence genome consists of two chromosomes
data/ctg.fa.gz # contigs of the reference genome
# contigs ids were formatted as CTG[1-3][0-9]{5}
# CTG1* and CTG2* were from chromosome 1 and 2, respectively
# the last two digits represent the order on the chromosome
# for example: CTG100006 is between CTG100005 and CTG100007
# CTG3* were misassembled with one contig from each chromosome
# the 3-4 digits indicates the location on chromosome 1
# and the 5-6 digits indicatest the location on chromosome 2
# for example: CTG300435 contains the 4th contig of the
# chromosome 1 and the 35th contig of the chromosome 2
# the join position was not showed but could be inferred with
# a mapping tool such as MUMmer
data/out2.vcf.gz # VCF file generated by Tassel 5 GBS pipeline for a diploid
data/out2_filtered.vcf.gz # VCF file after runing SNP filtering for out2.vcf.gz
data/out4.vcf.gz # VCF file generated by Tassel 5 GBS pipeline for a tetraploid
# Tassel 5 GBS pipeline does not take ploidy into account
# so the genotypes are showed as diploid
# polyGembler accepts it when using the AD field
data/out4_filtered.vcf.gz # VCF file after runing SNP filtering for out4.vcf.gz