Guest Instructor: Haikuo Li, Ph.D., Yale University
February 2025
In this lab section, you will:
→ Get familiar with samtools
→ Learn basic Python coding skills
→ Analyze and visualize fragment inserts in snATAC-seq data
→ Understand quality control procedures in snATAC-seq data analysis
Here, we will have a main lab task which is primarily based on Python.
You may run Python with the Yale HPC (recommended) (, or with your labtop or other resources.
You may run Python in Jupyter Notebook (recommended) or in the Linux Shell interface.
• Have pysam
, pandas
and matplotlib
installed in your Python.
o If you are using the Yale HPC, load the miniconda module, create a new conda environment containing python, pysam, pandas, matplotlib and jupyter. Unix scripts provided below:
##you must enter a computation node to do anything. So, salloc
##this command makes sure you have no modules loading now
module purge
## now let's create a new miniconda environment
module load miniconda
conda create -n atac_class python jupyter jupyterlab pysam matplotlib pandas
#enter y when the system asks you. This takes 3-5 minutes
conda activate atac_class
#you shouldn't see any errors with this command update
# now you can find this new miniconda environment on Yale HPC jupyter notebook
• No matter whether you use the Yale HPC or not, test by running these 3 commands in Python. Make sure they are all installed, and you shouldn’t see any error messages.
import pysam
import collections
import matplotlib.pyplot as plt
We will also use samtools (, which is a package used in the Linux Shell interface. If you are using the Yale HPC, you may simply have samtools ready to be used by this Unix command:
module load SAMtools
• If you are not using the Yale HPC, make sure samtools is installed. To check successful installation, you may run:
samtools --version
• We will download some BAM files provided by Cusanovich and Hill, et al. (database link:
• First, we will analyze the Cerebellum BAM data (Cerebellum_62216.bam). Since the original BAM file is big (2.1G), we generated a 10% downsampled subset for you. You should be able to access this file on Yale HPC.
• Do some online search and learn what a SAM/BAM file is.
• For Python training purposes, download a small demo data from my GitHub ( and upload it to your workspace. You may skip this if you are good at Python already.
• For post-lab assignment, choose any tissue you like, other than the cerebellum, that is available in this database. Download its BAM file (Note: not the .bam.bai which is the index file) and make sure the .bam file is available in your own Linux workspace (you may use wget