SHEPHARD-colab

This contains data used in the Google colab notebooks for SHEPHARD and links to SHEPHARD notebook tutorials implemented in Google-Colab notebooks.

SHEPHARD Code & Documentation

The SHEPHARD code can be found here.

The SHEPHARD documentation can be found here: https://shephard.readthedocs.io/en/latest/

Supporting Data & Manuscript Analysis Notebooks

Ginell, Flynn & Holehouse 2022

All data and code used for the analysis in the paper can be found here here

Google-Colab Human Proteome Notebook

We provide a ready-to-analyze notebook with the annotated human proteome which can be taken and used to perform novel proteome-wide analysis.

The human proteome is annotated with the following data:

Post-translational modifications
Intrinsically disordered regions
Prion-like domains
Per-residue secondary structure annotation
Per-residue solvation scores
Protein copy number

Once the first three cells are run, the user is free to either run the demo cells or begin novel analysis.

Google-Colab: human_proteome_analysis

Google-Colab Tutorial Notebooks

Below are links to run each of the SHEPHARD examples with google-colab:

To start learning how to use SHEPHARD click a google colab link below!

NOTE: The analyses done in the example notebooks are purely a demonstration of what is capable in SHEPHARD

Google-Colab Notebook Descriptions:
read_fasta_map_domains, get_overlaping_domains, get_sequence_around_site, find_sites_near_PTMs, find_lxvp_sites, uniprot_id_to_gene_name, add_callable_attributes, build_track_from_sliding_window

Working with sequences:

Domain Examples:

Google-Colab: read_fasta_map_domains

Functionally the example script identifies the C- and N-terminal domains across the proteins, calculates the serine and glycine content of those terminal domains, and returns the proteins that have C- of N- domains comprised of poly-GS. This example demonstrates how to:

Read in a FASTA using shephard.api.fasta module
Add domains to proteins
Analyze domains
Assign domain attributes

Google-Colab: get_overlaping_domains

This notebook provides an example for how to evaluate overlap of domains in proteins, as well as getting the fraction of overlap between any two domains. This example demonstrates how to:

Initialize an empty proteome and add proteins
Add domains to proteins
Use built-in domain manipulation functions and functions in shephard.tools.domain_tools
Calculate the fractional overlap between domains

Track Examples:

Google-Colab: build_track_from_sliding_window

This notebook provides an example for how to evalute a sqeuquence using a sliding window, as well as get the a region of a track that is within a domain. This example demonstrates how to:

Add Tracks based on a custom function
Calculate the fraction of residues using a slideing window
Extract the portion of track that alines to a specific domain

Site Examples:

Google-Colab: get_sequence_around_site

This example provides code that takes an input sequence and (1) defines all the arginines (R) residues as sites and (2) then gets the local sequence context around those sites. This example demonstrates how to:

Initialize an empty proteome and add protein
Find specific positions of residues
Add sites to proteins (adding a numerical value to the site)
Perform site-specific analysis
Get the local sequence context around a site

Google-Colab: find_sites_near_PTMs

This notebook reads in all the proteins from the human proteome and annotates them with PTMs. It then calculates the frequency of PTMs near sites of dimethyl-arginine in the human proteome. This example demonstrates how to:

Initialize an empty proteome and add proteins from a shepard protein file
Add sites from a sites file
Filter a proteome for sites of specific type
Get sites near other sites
Add proteome attribute for quick reference of performed analysis
Calculate frequency of PTM sites proximal to a site type

Google-Colab: find_lxvp_sites

This notebook reads in all the proteins and intrinsically disordered regions in the human proteome and annotates all examples of 'LXVP' motifs as Sites in the proteome. This example demonstrates how to:

Read in a uniprot FASTA using the shephard.api.uniprot module
Add domains from a shepard domains file
Iterate over domains in proteome
Find amino acid patterns in domain sequences
Add and count site based on identified pattern locations

Tools for streamlined analysis:

Working with attributes:

Google-Colab: uniprot_id_to_gene_name

This notebook provides an example for how to parse the complex protein headers of uniprot FASTA files and extract the proteins associated gene name. This example demonstrates how to:

Read in a UniProt FASTA using the shephard.api.uniprot module
Parse the UniProt FASTA header when annotated as a protein name
Add protein attribute
Write protein attributes using SHEPHARD interfaces
Read in protein attributes using SHEPHARD interfaces

Google-Colab: add_callable_attributes

This notebook provides an example for how to use associated attributes to save functions and call them later in analysis.

In this example, a lambda function is saved as proteome attribute which allows one to call the attribute and pass a protein length to identify the what percentile the in protein length is in in the proteome. This example demonstrates how to:

Read in a UniProt FASTA using the shephard.api.uniprot module
Generate an array comprised of protein lengths in the proteome
Save a lambda function as proteome attribute
Get the value of a parameter at a specific percentile relative to the proteome
Call a proteome attribute and pass it an input

Google-Colab: read_in_all_data

This notebook provides the base SHEPHARD session for exploritory analysis of the human proteome

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
example_notebooks		example_notebooks
shprd_data		shprd_data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SHEPHARD-colab

SHEPHARD Code & Documentation

Supporting Data & Manuscript Analysis Notebooks

Google-Colab Human Proteome Notebook

Google-Colab: human_proteome_analysis

Google-Colab Tutorial Notebooks

Working with sequences:

Domain Examples:

Google-Colab: read_fasta_map_domains

Google-Colab: get_overlaping_domains

Track Examples:

Google-Colab: build_track_from_sliding_window

Site Examples:

Google-Colab: get_sequence_around_site

Google-Colab: find_sites_near_PTMs

Google-Colab: find_lxvp_sites

Tools for streamlined analysis:

Working with attributes:

Google-Colab: uniprot_id_to_gene_name

Google-Colab: add_callable_attributes

Google-Colab: read_in_all_data

About

Releases

Packages

Contributors 2

Languages

License

holehouse-lab/shephard-colab

Folders and files

Latest commit

History

Repository files navigation

SHEPHARD-colab

SHEPHARD Code & Documentation

Supporting Data & Manuscript Analysis Notebooks

Google-Colab Human Proteome Notebook

Google-Colab: human_proteome_analysis

Google-Colab Tutorial Notebooks

Working with sequences:

Domain Examples:

Google-Colab: read_fasta_map_domains

Google-Colab: get_overlaping_domains

Track Examples:

Google-Colab: build_track_from_sliding_window

Site Examples:

Google-Colab: get_sequence_around_site

Google-Colab: find_sites_near_PTMs

Google-Colab: find_lxvp_sites

Tools for streamlined analysis:

Working with attributes:

Google-Colab: uniprot_id_to_gene_name

Google-Colab: add_callable_attributes

Google-Colab: read_in_all_data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages