-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
194 lines (116 loc) · 8.16 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
title: 'HistonePTM <img src="man/figures/logo.png" align="right" width="240" height="277"/>'
#author: Hassan Hijazi
date: "`r format(Sys.time(), '%d %B %Y')`"
output:
github_document:
toc: true
toc_depth: 2
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE, include_graphics = TRUE }
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/logo.png",
out.width = "100%"
)
```
<!-- badges: start -->
<!-- badges: end -->
## Overview
The goal of `histonePTM` is to make histone PTM analysis less tedious by offering a whole workflow analysis or functions that help build a workflow based on whatever software you are using.
Not only this, other functions allow retreiving data from the internet, manipulate `mgf` files, visualize results in addition to some quality control assessments.
Some functions rely heavily on other functions from well-established packages.
## Installation
You can install the development version of histonePTM from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("HijaziHassan/histonePTM") # you only have to run the code once to install it on
#your hard disk. After that use `library(histonePTM)`.
```
## Contributing
Any contribution is very welcomed. The first version is more adapted to `Proline` software output. But it tried to generalize each function to be generic and very flexible to be applicable for other software outputs.
## Getting help
If you encouter any bug, a problem, a weired behavior, or have a feature request, please open an
[issue](https://github.com/HijaziHassan/histonePTM/issues).
If you would like to discuss questions related to histone analysis using mass spectrometry, please open a discussion here [discussion](https://github.com/HijaziHassan/histonePTM/discussions).
## Workflows
### Analysis of DDA results using `analyzeHistone()`
If you are using [`Proline`](https://www.profiproteomics.fr/proline/)
software to validate identifications (IDs) resulted from search engines such as `Mascot`, the function `analyzeHistone()` can:
* **Isolate** histone peptides based on user-defined histone protein(s).
* **Normalize** intensities to the total area or intensity within peptide families or total filttered peptides.
* **Abbreviate** histone peptides.
* **Rename** PTMs strings into [Proforma](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771), [Brno nomenclature](https://www.nature.com/articles/nsmb0205-110), or other more simplistic representation.
* **Calculate** mean, standard deviation, and coefficient of variation for each ID in each condition.
* **Remove** and **store** duplications.
* **Mark** with (**\***) and **store** IDs where the software assigns the same peak apex (i.e. same intensities) to isobaric positional isomers (i.e. K18acK23un and K18unK23ac) which nearly co-elute.
* **Filter** unwanted and/or IDs that are -more often than not- false positives (e.g. H3 K37mod).
* **Filter** some IDs if they are not quantified in a user-defined number of samples.
This results in 3 Excel file:
* **File 1**: Containing the raw data with several sheets. Sheet 1 contains the raw data of isolated histone peptides without any transformation. The rest of the sheets are filtered data from the original data in the first sheet . E.g. Only N-terminally labeled peptides.
* **File(s) 2**: Peptide-centric. An excel file per histone protein with each sheet containing IDs from the same peptide.
* **File 3**: PTM-centric. An excel file summarizing PTMs with each sheet containing IDs with specific PTM.
All this with flexibility to:
- choose only to analyze (and output results) of user-defined histone protein (e.g. only `H3`).
- filter IDs with cut-off threshold of missing values.
- output **File(s) 2** with either removing all unlabeled `me1`, `K37mod` (for H3K27R40 peptide) or both.
- group **File(s) 2** into one file or save each protein results in a separate file.
**Pre-requisites**
* `Proline` excel output file containing the sheets:
+ `Best PSM from protein sets` which includes IDs and their intensities in each sample. This assumes that IDs with multiple charge states are already summed using post-processing functionality inside `Proline`.
+ `Search settings and infos` which includes information about `RAW` files' names and their corresponding search result files' names.
* An excel file containing at least three columns:
+ `SampleName`: custom samples names
+ `file`: names of `RAW` files.
+ `Condition`: concentration, WT vs disease ...
other recognized optional columns: `BioReplicate`, and/or `TechReplicate` depending on the experimental design.
For further detailed of this fucntion and other use `?` behind the function name without paraenthesis in R console to get the full documentation (i.e. `?analyzeHistone`).
```{r analyze DDA, warning=FALSE, message=FALSE}
library(histonePTM)
# analyzeHistone(analysisfile, # file name
# metafile, #metafile name
# hist_prot= c('All','H3', 'H4', 'H2A', 'H2B'), #choose one these options
# labeling = c("PA", "TMA", "PIC_PA", "none") # allow reversing labeling when renaming PTMs
# NA_threshold, #numeric #optional
# norm_method = c('peptide_family', 'peptide_total'),
# extra_filter = c("none", 'no_me1', "K37un", "no_me1_K37un"), #optional
# output_result= c('single', 'multiple'), #optional
```
Some functions used to build-up this workflow among others are shown below:
# 1. PTMs
Rename PTM strings from `Proline` or `Skyline` to have a shorthanded representation.
### 1.1 `ptm_beautify()`
#### Proline
```{r beautify Proline PTM}
#PTM from Proline export, from 'modifications' column of sheet 'Best PSM from protein sets'.
PTM_Proline <- 'Propionyl (Any N-term); Propionyl (K1); Butyryl (K10); Butyryl (K11)'
ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'keep')
ptm_beautify(PTM_Proline, lookup = histptm_lookup, software = 'Proline', residue = 'remove')
```
### Skyline
Skyline PTMs are enclosed between square brackets (e.g. [+28.0313]) and sometimes they are rounded (e.g [+28]). We don't support rounded numbers since some PTMs like [Ac] and [3Me] are rounded to the same number: +42. Use instead: 'Peptide Modified Sequence Monoisotopic Masses' column. Modified peptides in the 'isolation list' output file ('Comment' column) from Skyline always contains monoisotopic masses of PTMs as well.
```{r beautify Skyline PTM}
PTM_Skyline <- "K[+124.05243]SVPSTGGVK[+56.026215]K[+56.026215]PHR"
ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'keep')
ptm_beautify(PTM_Skyline, lookup = shorthistptm_mass, software = 'Skyline', residue = 'remove')
```
### 1.2 `misc_clearLabeling`
Remove the chemical labeling like `propionyl` (PA) or `TMA` which are not biologically relevant.
```{r Clear chemical labeling}
misc_clearLabeling("prNt-cr-pr-pr", labeling = "PA")
```
### 1.3 `ptm_toProForma()`
Convert PTM string to <a href="https://www.psidev.info/proforma">ProForma</a> ProForma (Proteoform and Peptidoform Notation)
```{r Convert PTMed peptide to ProForma}
histonePTM::ptm_toProForma(seq = "KSAPATGGVKKPHR",
mod = "Propionyl (Any N-term); Lactyl (K1); Dimethyl (K10); Propionyl (K11)")
ptm_toProForma(seq = "KSAPATGGVKKPHR",
mod = "TMAyl_correct (Any N-term); Butyryl (K1); Trimethyl (K10); Propionyl (K11)")
ptm_toProForma( seq = "KQLATKVAR",
mod = "Propionyl (Any N-term); Propionyl (K1); Propionyl (K6)")
```
### 1.4 `ptm_labelingAssessment()`
Lysine derivatization can go rogue and can label other residues such as S, T, and Y. When using propionic anhydride, this is called ' Overpropionylation'. Hydroxylamine is used to remove this adventitous labeling, so-called "reverse propionylation'. This function help for a quick visual review to see if overpropionylation is limited or enormous.
This for sure assumes that the database search results was run with `Propionyl (STY)` or any other labeling modification as varaible modification.