Co-evolution-TRNP1-and-GI

This repository contains data files and scripts to reproduce the analyses and results presented in our paper.

Protein

All relevant scripts listed below can be found within the protein/scripts folder.

Protein-coding sequence collection

collect_coding_seqs2/run_ccs2_function.sh - blast wrapper to extract orthologous protein-coding sequences from genomes. If you want to do the blast search using our scripts, you will need to download genomes listed in table protein/data/TRNP1_source_genomes.csv
processingL_final.R - process our own primate sequence assemblies from targeted re-sequencing
collect_coding_seqs2/Ferret_transcr_assembly_steps.sh and collect_coding_seqs2/Ferret_cutContig_makeFa.R - process the re-sequenced ferret sequence
collect_coding_seqs.R - gather the orthologous TRNP1 protein-coding sequences from all included sources. Intersect with the available trait data. Save sequences and traits for the downstream analyses

Evolutionary analysis of TRNP1 coding-sequence

Multiple Alignments with PRANK (v150803)

align_with_prank.sh - protein-coding sequence alignment

PAML (v4.8)

First, run PAML site models as described in the README from folder PAML. select_sign_sites_PAML_M8.R - pull out the identified sites under positive selection

COEVOL (v1.4)

run_coevol.sh - wrapper to run Coevol; finish_coevol.sh - wrapper to stop Coevol and generate summaries
summarize_coevol_output_TRNP1.R - access the estimated omega, correlations and posterior probabilities

Scripts for evolutionary analysis of control proteins can be found in a separate folder other_protein_alignments where there is a separate README on this part.

Analysis of NPC proliferation assay

proliferation_analysis.R - gather proliferation assay data, estimate proliferation rates using logistic regression, infer association with brain size and GI using PGLS

Regulation

All relevant scripts listed below can be found within the regulation/scripts folder.

MPRA assay

MPRA/MPRA_sequences.R - identify and collect orthologous TRNP1 CRE sequences across mammals from our sequenced data as well as published genomes
MPRA/MPRA_oligolib_construction.R - MPRA design - using a sliding window, construct enhancer tiles based on the orthologous CRE sequences from the previous script to test within the MPRA assay
MPRA/preprocessing MPRA count pre-processing. Extract reporter gene expression counts for each included enhancer tile. This folder contains a README with further details
MPRA/collect_MPRA_fastas.R - separate and save the relevant sequences from each of the 7 CRE regions, align using MAFFT (v7.407)
MPRA/MPRA_analysis.R - filter and summarize CRE activities. Plug into PGLS and compare to brain mass and gyrification
MPRA/combine_dnds_intron.R - combine TRNP1 protein evolution rates inferred using Coevol with the intron activity across catharrines within the same model

Transcription factor analysis

TFs/download_motifs_JASPAR2020.R - download PWMs and motif clustering from JASPAR 2020, transform PWMs for Cluster-Buster
TFs/MPRNAseq_NPC.yaml - zUMIs (v2.5.4) yaml file for mapping RNA-seq reads from NPCs. Input raw data for this processing can be accessed under E-MTAB-9951
TFs/TF_expression_analysis.R - find the expressed transcription factors in our NPCs (from bulk RNA-seq data). Run Cluster-Buster (Jun 13 2019) on the intron sequences including only the PWMs of the expressed TFs to identify overrepresented motifs
TFs/PGLS_motifs.R - investigate binding score assocation with intron CRE activity and GI among the 22 most abundant motifs on the intron sequence using PGLS

Tree construction: regulation/scripts/MPRA/tree_construction.R

Throughout the workflow, we are using job scheduling system slurm (v0.4.3).

Aditional Tables

Primer sequences for the resequencing of putative Trnp1 cis-regulatory elements as well as for the MPRA can be found in oligo_sequences/. For more information on the different tables please have a look at the README oligo_sequences/README

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
oligo_sequences		oligo_sequences
pheno_data		pheno_data
protein		protein
regulation		regulation
LICENSE		LICENSE
README.md		README.md
session-info.txt		session-info.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Co-evolution-TRNP1-and-GI

Protein

Protein-coding sequence collection

Evolutionary analysis of TRNP1 coding-sequence

Multiple Alignments with PRANK (v150803)

PAML (v4.8)

COEVOL (v1.4)

Analysis of NPC proliferation assay

Regulation

MPRA assay

Transcription factor analysis

Aditional Tables

About

Releases

Packages

Contributors 4

Languages

License

Hellmann-Lab/Co-evolution-TRNP1-and-GI

Folders and files

Latest commit

History

Repository files navigation

Co-evolution-TRNP1-and-GI

Protein

Protein-coding sequence collection

Evolutionary analysis of TRNP1 coding-sequence

Multiple Alignments with PRANK (v150803)

PAML (v4.8)

COEVOL (v1.4)

Analysis of NPC proliferation assay

Regulation

MPRA assay

Transcription factor analysis

Aditional Tables

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages