09 December 2024
The goal of histonePTM
is to make histone PTM analysis less tedious by
offering a whole workflow analysis or functions that help build a
workflow based on whatever software you are using.
Not only this, other functions allow retreiving data from the internet,
manipulate mgf
files, visualize results in addition to some quality
control assessments.
Some functions rely heavily on other functions from well-established packages.
You can install the development version of histonePTM from GitHub with:
# install.packages("remotes")
remotes::install_github("HijaziHassan/histonePTM") # you only have to run the code once to install it on
#your hard disk. After that use `library(histonePTM)`.
Any contribution is very welcomed. The first version is more adapted to
Proline
software output. But it tried to generalize each function to
be generic and very flexible to be applicable for other software
outputs.
If you encouter any bug, a problem, a weired behavior, or have a feature request, please open an issue.
If you would like to discuss questions related to histone analysis using mass spectrometry, please open a discussion here discussion.
If you are using Proline
software to validate identifications (IDs) resulted from search engines
such as Mascot
, the function analyzeHistone()
can:
- Isolate histone peptides based on user-defined histone protein(s).
- Normalize intensities to the total area or intensity within peptide families or total filttered peptides.
- Abbreviate histone peptides.
- Rename PTMs strings into Proforma, Brno nomenclature, or other more simplistic representation.
- Calculate mean, standard deviation, and coefficient of variation for each ID in each condition.
- Remove and store duplications.
- Mark with (*) and store IDs where the software assigns the same peak apex (i.e. same intensities) to isobaric positional isomers (i.e. K18acK23un and K18unK23ac) which nearly co-elute.
- Filter unwanted and/or IDs that are -more often than not- false positives (e.g. H3 K37mod).
- Filter some IDs if they are not quantified in a user-defined number of samples.
This results in 3 Excel file:
-
File 1: Containing the raw data with several sheets. Sheet 1 contains the raw data of isolated histone peptides without any transformation. The rest of the sheets are filtered data from the original data in the first sheet . E.g. Only N-terminally labeled peptides.
-
File(s) 2: Peptide-centric. An excel file per histone protein with each sheet containing IDs from the same peptide.
-
File 3: PTM-centric. An excel file summarizing PTMs with each sheet containing IDs with specific PTM.
All this with flexibility to:
- choose only to analyze (and output results) of user-defined histone
protein (e.g. only
H3
). - filter IDs with cut-off threshold of missing values.
- output File(s) 2 with either removing all unlabeled
me1
,K37mod
(for H3K27R40 peptide) or both. - group File(s) 2 into one file or save each protein results in a separate file.
Pre-requisites
-
Proline
excel output file containing the sheets: -
Best PSM from protein sets
which includes IDs and their intensities in each sample. This assumes that IDs with multiple charge states are already summed using post-processing functionality insideProline
. -
Search settings and infos
which includes information aboutRAW
files’ names and their corresponding search result files’ names. -
An excel file containing at least three columns:
SampleName
: custom samples namesfile
: names ofRAW
files.Condition
: concentration, WT vs disease … other recognized optional columns:BioReplicate
, and/orTechReplicate
depending on the experimental design.
For further detailed of this fucntion and other use ?
behind the
function name without paraenthesis in R console to get the full
documentation (i.e. ?analyzeHistone
).
library(histonePTM)
# analyzeHistone(analysisfile, # file name
# metafile, #metafile name
# hist_prot= c('All','H3', 'H4', 'H2A', 'H2B'), #choose one these options
# labeling = c("PA", "TMA", "PIC_PA", "none") # allow reversing labeling when renaming PTMs
# NA_threshold, #numeric #optional
# norm_method = c('peptide_family', 'peptide_total'),
# extra_filter = c("none", 'no_me1', "K37un", "no_me1_K37un"), #optional
# output_result= c('single', 'multiple'), #optional
Some functions used to build-up this workflow among others are shown below:
Rename PTM strings from Proline
or Skyline
to have a shorthanded
representation.
#PTM from Proline export, from 'modifications' column of sheet 'Best PSM from protein sets'.
PTM_Proline <- 'Propionyl (Any N-term); Propionyl (K1); Butyryl (K10); Butyryl (K11)'
ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'keep')
#> [1] "prNt-K1pr-K10bu-K11bu"
ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'remove')
#> [1] "prNt-pr-bu-bu"
Skyline PTMs are enclosed between square brackets (e.g. [+28.0313]) and sometimes they are rounded (e.g [+28]). We don’t support rounded numbers since some PTMs like [Ac] and [3Me] are rounded to the same number: +42. Use instead: ‘Peptide Modified Sequence Monoisotopic Masses’ column. Modified peptides in the ‘isolation list’ output file (‘Comment’ column) from Skyline always contains monoisotopic masses of PTMs as well.
PTM_Skyline <- "K[+124.05243]SVPSTGGVK[+56.026215]K[+56.026215]PHR"
ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'keep')
#> [1] "prNt-KcrSVPSTGGVKprKprPHR"
ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'remove')
#> [1] "prNt-cr-pr-pr"
Remove the chemical labeling like propionyl
(PA) or TMA
which are
not biologically relevant.
misc_clearLabeling("prNt-cr-pr-pr", labeling = "PA")
#> [1] "cr-un-un"
Convert PTM string to ProForma ProForma (Proteoform and Peptidoform Notation)
histonePTM::ptm_toProForma(seq = "KSAPATGGVKKPHR",
mod = "Propionyl (Any N-term); Lactyl (K1); Dimethyl (K10); Propionyl (K11)")
#> [1] "[UNIMOD:58]-K[UNIMOD:2114]SAPATGGVK[UNIMOD:36]K[UNIMOD:58]PHR"
ptm_toProForma(seq = "KSAPATGGVKKPHR",
mod = "TMAyl_correct (Any N-term); Butyryl (K1); Trimethyl (K10); Propionyl (K11)")
#> [1] "[TMAyl_correct]-K[UNIMOD:1289]SAPATGGVK[UNIMOD:37]K[UNIMOD:58]PHR"
ptm_toProForma( seq = "KQLATKVAR",
mod = "Propionyl (Any N-term); Propionyl (K1); Propionyl (K6)")
#> [1] "[UNIMOD:58]-K[UNIMOD:58]QLATK[UNIMOD:58]VAR"
Lysine derivatization can go rogue and can label other residues such as S, T, and Y. When using propionic anhydride, this is called ’ Overpropionylation’. Hydroxylamine is used to remove this adventitous labeling, so-called “reverse propionylation’. This function help for a quick visual review to see if overpropionylation is limited or enormous.
This for sure assumes that the database search results was run with
Propionyl (STY)
or any other labeling modification as varaible
modification.