-
Notifications
You must be signed in to change notification settings - Fork 51
/
analysis_of_genome.Rmd
141 lines (85 loc) · 3.59 KB
/
analysis_of_genome.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "Analysis of genome data"
subtitle: '*BJ Knaus, JF Tabima, and NJ Grünwald*'
output:
html_document:
toc: true
toc_depth: 2
bibliography: bibtexlib.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.align = "center")
library("knitcitations")
library("knitr")
cite_options(citation_format = "pandoc", max.names = 3, style = "html", hyperlink = "to.doc")
bib <- read.bibtex("bibtexlib.bib")
source("style.R")
```
## Introduction
Analysis of genome data for populations can be seen as similar to the analyses of other marker systems discussed in previous chapters of this book, except that genome data analyses include larger quantities of data.
For example, VCF data (discussed in '[reading VCF data](reading_vcf.html)') can be read into R using *vcfR* [@knaus2017vcfr] to create a *vcfR* object.
This object can be converted into a genlight object [@jombart2008adegenet] and then a snpclone object [@kamvar2014poppr; @kamvar2015novel] if deemed necessary.
Analysis on these objects has been covered in previous sections.
Genome scale data provides additional analytical options as well.
For example, when assumptions about the neutrality of the majority of the genome are appropriate, this can be used as a null hypothesis and used to help identify markers that differentiate from this assumption.
Here we'll provide examples of how genomic data may be analyzed.
```{r child = 'data_genome.Rmd'}
```
## Opening and examining the dataset
We'll read our VCF data into R using the function `read.vcfR()`.
This is data from the pinfsc50 data set that we filtered for quality in the section [reading VCF data](reading_vcf.html).
Once the file is read in we can validate its contents using the `show` method which is implemented by executing the object's name at the prompt.
```{r, results='hide'}
library('vcfR')
vcf <- read.vcfR("pinfsc50_filtered.vcf.gz")
```
```{r}
vcf
```
The `show` method reports that we have `r ncol(vcf@gt) - 1` samples and `r format(nrow(vcf@fix), big.mark = ",")` variants.
If this matches our expectation then we can proceed.
## Converting VCF data to a genlight object
```{r child = 'vcf2genlight.Rmd'}
```
```{r child = 'genlight_dist.Rmd'}
```
```{r child = 'msn_genlight.Rmd', eval=F}
```
```{r child = 'tree_genlight.Rmd', eval=F}
```
```{r child = 'amova_analysis.Rmd', eval=F}
```
```{r child = 'genome_dapc.Rmd', eval=F}
```
## chromR objects
### Using chromR to locate unusual features in a genome
```{r child = 'chromR.Rmd'}
```
## Genetic differentiation
```{r child = 'genetic_differentiation.Rmd'}
```
## Exercises
**1)** You actually have everything you need to make a Manhattan plot.
Can you figure out how to plot $G'_{ST}$ (y-axis) by genomic position (POS)?
```{r hide_button = TRUE, fig.height=4}
plot(getPOS(vcf), myDiff$Gprimest, pch = 20, col = "#1E90FF44", xlab = "", ylab = "", ylim = c(0, 1), xaxt = "n")
axis(side = 1, at = seq(0, 1e5, by = 1e4), labels = seq(0, 100, by = 10))
title(xlab='Genomic position (Kbp)')
title(ylab = expression(italic("G'"["ST"])))
```
**2)** This Manhatttan plot should look a bit unusual.
Can you think of anything that may be wrong with this analysis?
```{r hide_button = TRUE}
table(pop)
# Very small sample size!
```
**3)** Can you figure out how to zoom in on a particular region of a chromosome in `chromoqc()`?
```{r hide_button = TRUE}
chromoqc(chrom, dp.alpha = 66, xlim = c(2e05, 4e05))
```
**4)** Can you use the function `queryMETA()` to look for other data in your file that may be of interest?
```{r hide_button = TRUE}
queryMETA(vcf)
```
## References