Guest Instructor: Haikuo Li, Ph.D., Yale University
February 2025
In this lab section, you will:
→ Get familiar with samtools
→ Learn basic Python coding skills
→ Analyze and visualize fragment inserts in snATAC-seq data
→ Understand quality control procedures in snATAC-seq data analysis
Here, we will have a main lab task which is primarily based on Python.
You may run Python with the Yale HPC (recommended) (https://beng469.ycrc.yale.edu/), or with your labtop or other resources.
You may run Python in Jupyter Notebook (recommended) or in the Linux Shell interface.
• Have pysam
, pandas
and matplotlib
installed in your Python.
o If you are using the Yale HPC, load the miniconda module, create a new conda environment containing python, pysam, pandas, matplotlib and jupyter. Unix scripts provided below:
##you must enter a computation node to do anything. So, salloc
salloc
##this command makes sure you have no modules loading now
module purge
## now let's create a new miniconda environment
module load miniconda
conda create -n atac_class python jupyter jupyterlab pysam matplotlib pandas
#enter y when the system asks you. This takes 3-5 minutes
conda activate atac_class
#you shouldn't see any errors with this command
ycrc_conda_env.sh update
# now you can find this new miniconda environment on Yale HPC jupyter notebook
• No matter whether you use the Yale HPC or not, test by running these 3 commands in Python. Make sure they are all installed, and you shouldn’t see any error messages.
import pysam
import collections
import matplotlib.pyplot as plt
We will also use samtools (https://www.htslib.org/), which is a package used in the Linux Shell interface. If you are using the Yale HPC, you may simply have samtools ready to be used by this Unix command:
module load SAMtools
• If you are not using the Yale HPC, make sure samtools is installed. To check successful installation, you may run:
samtools --version
• We will download some BAM files provided by Cusanovich and Hill, et al. (database link: https://atlas.gs.washington.edu/mouse-atac/data/).
• First, we will analyze the Cerebellum BAM data (Cerebellum_62216.bam). Since the original BAM file is big (2.1G), we generated a 10% downsampled subset for you. You should be able to access this file on Yale HPC.
• Do some online search and learn what a SAM/BAM file is.
• For Python training purposes, download a small demo data from my GitHub (https://github.com/HaikuoLi/Yale_BENG469_teaching/blob/main/ATAC_meta.csv) and upload it to your workspace. You may skip this if you are good at Python already.
• For post-lab assignment, choose any tissue you like, other than the cerebellum, that is available in this database. Download its BAM file (Note: not the .bam.bai which is the index file) and make sure the .bam file is available in your own Linux workspace (you may use wget
).