Skip to content

R package to handle PTMs in proteomics analysis pipelines

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

HijaziHassan/histonePTM

Repository files navigation

HistonePTM

09 December 2024

Overview

The goal of histonePTM is to make histone PTM analysis less tedious by offering a whole workflow analysis or functions that help build a workflow based on whatever software you are using.

Not only this, other functions allow retreiving data from the internet, manipulate mgf files, visualize results in addition to some quality control assessments.

Some functions rely heavily on other functions from well-established packages.

Installation

You can install the development version of histonePTM from GitHub with:

# install.packages("remotes")
remotes::install_github("HijaziHassan/histonePTM") # you only have to run the code once to install it on 
                                                  #your hard disk. After that use `library(histonePTM)`.

Contributing

Any contribution is very welcomed. The first version is more adapted to Proline software output. But it tried to generalize each function to be generic and very flexible to be applicable for other software outputs.

Getting help

If you encouter any bug, a problem, a weired behavior, or have a feature request, please open an issue.

If you would like to discuss questions related to histone analysis using mass spectrometry, please open a discussion here discussion.

Workflows

Analysis of DDA results using analyzeHistone()

If you are using Proline software to validate identifications (IDs) resulted from search engines such as Mascot, the function analyzeHistone() can:

  • Isolate histone peptides based on user-defined histone protein(s).
  • Normalize intensities to the total area or intensity within peptide families or total filttered peptides.
  • Abbreviate histone peptides.
  • Rename PTMs strings into Proforma, Brno nomenclature, or other more simplistic representation.
  • Calculate mean, standard deviation, and coefficient of variation for each ID in each condition.
  • Remove and store duplications.
  • Mark with (*) and store IDs where the software assigns the same peak apex (i.e. same intensities) to isobaric positional isomers (i.e. K18acK23un and K18unK23ac) which nearly co-elute.
  • Filter unwanted and/or IDs that are -more often than not- false positives (e.g. H3 K37mod).
  • Filter some IDs if they are not quantified in a user-defined number of samples.

This results in 3 Excel file:

  • File 1: Containing the raw data with several sheets. Sheet 1 contains the raw data of isolated histone peptides without any transformation. The rest of the sheets are filtered data from the original data in the first sheet . E.g. Only N-terminally labeled peptides.

  • File(s) 2: Peptide-centric. An excel file per histone protein with each sheet containing IDs from the same peptide.

  • File 3: PTM-centric. An excel file summarizing PTMs with each sheet containing IDs with specific PTM.

All this with flexibility to:

  • choose only to analyze (and output results) of user-defined histone protein (e.g. only H3).
  • filter IDs with cut-off threshold of missing values.
  • output File(s) 2 with either removing all unlabeled me1, K37mod (for H3K27R40 peptide) or both.
  • group File(s) 2 into one file or save each protein results in a separate file.

Pre-requisites

  • Proline excel output file containing the sheets:

  • Best PSM from protein sets which includes IDs and their intensities in each sample. This assumes that IDs with multiple charge states are already summed using post-processing functionality inside Proline.

  • Search settings and infos which includes information about RAW files’ names and their corresponding search result files’ names.

  • An excel file containing at least three columns:

    • SampleName: custom samples names
    • file: names of RAW files.
    • Condition: concentration, WT vs disease … other recognized optional columns: BioReplicate, and/or TechReplicate depending on the experimental design.

For further detailed of this fucntion and other use ? behind the function name without paraenthesis in R console to get the full documentation (i.e. ?analyzeHistone).

library(histonePTM)

# analyzeHistone(analysisfile, # file name
#                 metafile, #metafile name
#                 hist_prot= c('All','H3', 'H4', 'H2A', 'H2B'), #choose one these options
#                 labeling = c("PA", "TMA", "PIC_PA", "none") # allow reversing labeling when renaming PTMs
#                 NA_threshold, #numeric #optional
#                 norm_method = c('peptide_family', 'peptide_total'),
#                 extra_filter = c("none", 'no_me1', "K37un", "no_me1_K37un"), #optional
#                 output_result= c('single', 'multiple'), #optional
               

Some functions used to build-up this workflow among others are shown below:

1. PTMs

Rename PTM strings from Proline or Skyline to have a shorthanded representation.

1.1 ptm_beautify()

Proline

#PTM from Proline export, from 'modifications' column of sheet 'Best PSM from protein sets'.
PTM_Proline <- 'Propionyl (Any N-term); Propionyl (K1); Butyryl (K10); Butyryl (K11)'

ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'keep')
#> [1] "prNt-K1pr-K10bu-K11bu"

 
ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'remove')
#> [1] "prNt-pr-bu-bu"

Skyline

Skyline PTMs are enclosed between square brackets (e.g. [+28.0313]) and sometimes they are rounded (e.g [+28]). We don’t support rounded numbers since some PTMs like [Ac] and [3Me] are rounded to the same number: +42. Use instead: ‘Peptide Modified Sequence Monoisotopic Masses’ column. Modified peptides in the ‘isolation list’ output file (‘Comment’ column) from Skyline always contains monoisotopic masses of PTMs as well.

PTM_Skyline <- "K[+124.05243]SVPSTGGVK[+56.026215]K[+56.026215]PHR"
 

ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'keep')
#> [1] "prNt-KcrSVPSTGGVKprKprPHR"

ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'remove')
#> [1] "prNt-cr-pr-pr"

1.2 misc_clearLabeling

Remove the chemical labeling like propionyl (PA) or TMA which are not biologically relevant.

misc_clearLabeling("prNt-cr-pr-pr", labeling = "PA")
#> [1] "cr-un-un"

1.3 ptm_toProForma()

Convert PTM string to ProForma ProForma (Proteoform and Peptidoform Notation)

histonePTM::ptm_toProForma(seq = "KSAPATGGVKKPHR",
                mod = "Propionyl (Any N-term); Lactyl (K1); Dimethyl (K10); Propionyl (K11)")
#> [1] "[UNIMOD:58]-K[UNIMOD:2114]SAPATGGVK[UNIMOD:36]K[UNIMOD:58]PHR"

ptm_toProForma(seq = "KSAPATGGVKKPHR",
               mod = "TMAyl_correct (Any N-term); Butyryl (K1); Trimethyl (K10); Propionyl (K11)")
#> [1] "[TMAyl_correct]-K[UNIMOD:1289]SAPATGGVK[UNIMOD:37]K[UNIMOD:58]PHR"

ptm_toProForma(  seq = "KQLATKVAR",
                 mod = "Propionyl (Any N-term); Propionyl (K1); Propionyl (K6)")
#> [1] "[UNIMOD:58]-K[UNIMOD:58]QLATK[UNIMOD:58]VAR"

1.4 ptm_labelingAssessment()

Lysine derivatization can go rogue and can label other residues such as S, T, and Y. When using propionic anhydride, this is called ’ Overpropionylation’. Hydroxylamine is used to remove this adventitous labeling, so-called “reverse propionylation’. This function help for a quick visual review to see if overpropionylation is limited or enormous.

This for sure assumes that the database search results was run with Propionyl (STY) or any other labeling modification as varaible modification.

About

R package to handle PTMs in proteomics analysis pipelines

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published