This project was a big IR phenotype and genomic surveillance study in Tanzania. An. funestus were collected from across Tanzania, phenotyped for IR, and genome sequenced to identify selection at IR loci, and IR allele frequencies, as well as population structure and history. The IR phenotype surveillance data is published in Joel's paper here. These data showed that mosquitoes were resistant to DDT - strange, given DDT is an obsolete, banned, pesticide.
Genome sequencing data collected and analysed as part of this project showed evidence of weak selection at the Vgsc locus, in samples from a single region in Tanzania. Further examination of the Vgsc locus revealed that knock-down resistance, in the form of the L976F (995S/F in An. gambiae and 1014F/S in M. domestica), had appeared in An. funestus where, hitherto, IR had only emerged in the form of metabolic resistance through detoxification enzymes and others. Kdr in An. funestus appeared often in concert with another linked mutation, P1842S, and appeared as part of a weak selective sweep in Morogoro region. We found that Kdr appeared to confer resistance to DDT, and not to other, more widely used vector control pesticides. A range of plausible hypotheses for DDT emergence in An. funestus present themselves, but one we found the most plausible was extensive DDT pollution across Tanzania, as well as unofficial use by farmers.
This repository contains the code used to analyse the data generated in this analysis. The R-markdown contains:
- Map of sequenced samples.
- The statistical models associating DDT resistance phenotypes with Kdr.
- Plots of the previously published bioassay data.
- Allele frequency by location and timepoint, showing the frequency of Vsgc mutations, and a possible decline in frequency over time in Morogoro region.
- LD heatmap plots showing that 976F and 1842S occur as linked haplotypes.
The jupyter notebook contains:
- The H12 and G123 selection scans showing signatures of a weak selective sweep around the Vgsc locus in Morogoro region.
- The code generating the raw data for the heatmaps (plotted in R as I prefer plotting heatmaps in R).
- The haplotype clustering dendrogram showing that the linked P976F/1842S haplotype is responsible for the weak sweep in Morogoro region.
If you wish to replicate the analysis, clone this repo, and run the jupyter notebook first. Make sure you are able to access malariagen_data. The notebook should run and download the data required to plot the rest of the analyses in the RMarkdown.
Apologies for the two different plotting notebooks. I find myself using Python increasingly, but I am still a fan of R when it comes to statistics and heatmaps!
The preprint of the article, recently accepted in Molecular Ecology is here.
Tristan Dennis, Aug 2024.