Skip to content

Code to reproduce Multi-omics analysis in primary T cells elucidates mechanisms behind disease associated genetic loci

License

Notifications You must be signed in to change notification settings

ChenfuShi/PsA_cleaned_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fe5ab92 · Feb 11, 2025

History

20 Commits
Jun 29, 2024
Jun 30, 2024
Jun 29, 2024
Jun 30, 2024
Jun 30, 2024
Jun 29, 2024
Jul 11, 2023
Jul 2, 2024
Jul 11, 2023
Jun 29, 2024
Jul 11, 2023
Feb 11, 2025
Jul 11, 2023

Repository files navigation

Multi-omics analysis in primary T cells elucidates mechanisms behind disease associated genetic loci

Introduction

This repository contains the scripts to reproduce the analysis presented in the manuscript. The scripts will often use paths as set up to work in the environment of the university of manchester cluster. These will need to be changed when you download the preprocessed data and set up the scripts in your environment.

Please find all pre-processed data in http://bartzabel.ls.manchester.ac.uk/orozcolab/SNP2Mechanism/.
This data is also available in the following repositories:

  • Precomputed Hi-C correlation maps with gene expression, chromatin accessibility, and various variants are available on BioStudies (https://doi.org/https://doi.org/10.6019/S-BSST1819)
  • Processed datasets are available from GEO accession numbers GSE282511 (RNA-seq), GSE282510 (Hi-C), and GSE282992 (ATAC-seq).
  • The raw data have been deposited with links to BioProject accession number PRJNA1185164 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/).

A static version of this repository is uploaded to zenodo.

Additional data not uploaded to github is located in http://bartzabel.ls.manchester.ac.uk/orozcolab/SNP2Mechanism/PsA_cleaned_analysis.

Two notebooks that might be useful:
print all QTL and allelic imbalance information for a particular SNP: access on colab
Identify the closes SNPs for which we have pregenerated a Hi-C correlation map: access on colab

Citation

If you use our data or use our code to run your analysis please cite the manuscript:

Multi-omics analysis in primary T cells elucidates mechanisms behind disease-associated genetic loci
Chenfu Shi, Danyun Zhao, Jake Butler, Antonios Frantzeskos, Stefano Rossi, James Ding, Carlo Ferrazzano, Charlotte Wynn, Ryan Malcolm Hum, Ellie Richards, Muskan Gupta, Khadijah Patel, Chuan Fu Yap, Darren Plant, Richard Grencis, Paul Martin, Antony Adamson, Stephen Eyre, John Bowes, Anne Barton, Pauline Ho, Magnus Rattray & Gisela Orozco.
Genome Biol 26, 26 (2025).
https://doi.org/10.1186/s13059-025-03492-y

Requirements

The analysis includes a mix of Python and R scripts. Python is run through a conda environment which contains most of the other tools necessary the analysis. The full environment definition is available in conda_env.yml. In the scripts this is activated as new_basic_software, so if you want to use that script you need to change the conda activation to the name of the environment you have installed.

Data Preprocessing

Data has been preprocessed according to the manuscript's methods. Preprocessing for ATAC-seq and Hi-C data is carried out using the pipeline available at https://github.com/ChenfuShi/hic_master_pipeline and https://github.com/ChenfuShi/ATAC_ChIP_pipeline.

Useful Files

You can find the following files to be particularly useful if you want to do further analysis:
QTL tables:

eQTL nominal CD4+
eQTL nominal CD8+
eQTL permutation CD4+
eQTL permutation CD8+

caQTL nominal CD4+
caQTL nominal CD8+
caQTL permutation CD4+
caQTL permutation CD8+

insQTL nominal CD4+
insQTL nominal CD8+
insQTL permutation CD4+
insQTL permutation CD8+

loopQTL nominal CD4+
loopQTL nominal CD8+
loopQTL permutation CD4+
loopQTL permutation CD8+

Allelic imbalance tables:

loop allelic imbalance CD4+
loop allelic imbalance CD8+
loop allelic imbalance merged

ATAC allelic imbalance CD4+
ATAC allelic imbalance CD8+
ATAC allelic imbalance merged

pregenerated correlation Hi-C maps for all genes, chromatin accessibility and highly significant loop and insulation QTLs:
webpage

samples metadata tables

RNA-seq counts for all samples
ATAC-seq counts for all samples

Merged Hi-C juicebox files for visualization at up to 1kb resolution AllValidPairs files for each sample

Directory Listing

  • ATAC_allelic_imbalance: scripts to call chromatin accessibility allelic imbalance, annotate the resulting tables and annotate GWAS studies with the results. In this folder you can also find the precomputed allelic imbalance (ATAC_allelic_imbalance/combined_p_vals_files/all_SNPs_all.csv) for all the SNPs that overlap chromatin accessibility regions in T cells.
  • ATAC_seq_analysis: Diffbind preprocessing of the data.
  • data_functions: two helper functions for other scripts.
  • HiC_allelic_imbalance: scripts to call loop allelic imbalance, including genotype calling and phasing from reads. In this folder you can also find the precomputed allelic imbalance loops (HiC_allelic_imbalance/output_dataframe_CD8.csv and HiC_allelic_imbalance/output_dataframe_CD4.csv)
  • HiC_analysis: scripts to call mustache loops, extract loop counts, differential loop calling, insulation score, differential insulation score, outlier analysis from cell lines.
  • integration_analysis: scripts that combine different omics together. Visualizations for changes in chromatin conformation with genotype, ATAC-peaks and gene expression. Correlation between loop strength with chromatin accessibility and gene expression. Correlation between insulation score and gene expression. scripts that allow the plotting of the data for specific regions (integration_analysis/plotter.ipynb) and printing of all the results from a specific set of SNPs (integration_analysis/everything_printer.ipynb).
  • metadata: metadata files and GWAS snps with LD.
  • QTL_analysis: All QTL analysis for the different omics, including files to annotate GWAS studies and compute overlaps between different QTLs and allelic imbalance.
  • RNA_seq_analysis: DESeq2 preprossing of the data.

About

Code to reproduce Multi-omics analysis in primary T cells elucidates mechanisms behind disease associated genetic loci

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages