diff --git a/DESCRIPTION b/DESCRIPTION index 0faf4c66..9b857d43 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: protti Title: Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools -Version: 0.9.0 +Version: 0.9.1 Authors@R: c(person(given = "Jan-Philipp", family = "Quast", @@ -43,7 +43,7 @@ Imports: methods, R.utils, stats -RoxygenNote: 7.3.1 +RoxygenNote: 7.3.2 Suggests: testthat, covr, @@ -67,7 +67,9 @@ Suggests: iq, scales, farver, - ggforce + ggforce, + xml2, + jsonlite Depends: R (>= 4.0) URL: https://github.com/jpquast/protti, https://jpquast.github.io/protti/ diff --git a/NAMESPACE b/NAMESPACE index 09754811..e4e57730 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -134,6 +134,7 @@ importFrom(purrr,pluck) importFrom(purrr,pmap) importFrom(purrr,reduce) importFrom(purrr,set_names) +importFrom(readr,read_csv) importFrom(readr,read_tsv) importFrom(readr,write_csv) importFrom(readr,write_tsv) diff --git a/NEWS.md b/NEWS.md index f947cda2..f4e23d7e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,8 @@ +# protti 0.9.1 + +## Bug fixes +* `try_query()` now correctly handles errors that don't return a response object. We also handle gzip decompression problems better since some databases compressed responses were not handled correctly. + # protti 0.9.0 ## New features diff --git a/R/calculate_protein_abundance.R b/R/calculate_protein_abundance.R index 5fb71220..af63a840 100644 --- a/R/calculate_protein_abundance.R +++ b/R/calculate_protein_abundance.R @@ -18,12 +18,11 @@ #' for a protein to be included in the analysis. The default value is 3, which means #' proteins with fewer than three unique peptides will be excluded from the analysis. #' @param method a character value specifying with which method protein quantities should be -#' calculated. Possible options include \code{"sum"}, which takes the sum of all precursor -#' intensities as the protein abundance. Another option is \code{"iq"}, which performs protein +#' calculated. Possible options include `"sum"`, which takes the sum of all precursor +#' intensities as the protein abundance. Another option is `"iq"`, which performs protein #' quantification based on a maximal peptide ratio extraction algorithm that is adapted from the #' MaxLFQ algorithm of the MaxQuant software. Functions from the -#' \href{https://academic.oup.com/bioinformatics/article/36/8/2611/5697917}{\code{iq}} package are -#' used. Default is \code{"iq"}. +#' `iq` package (\doi{10.1093/bioinformatics/btz961}) are used. Default is `"iq"`. #' @param for_plot a logical value indicating whether the result should be only protein intensities #' or protein intensities together with precursor intensities that can be used for plotting using #' \code{peptide_profile_plot()}. Default is \code{FALSE}. diff --git a/R/data.R b/R/data.R index ae4e9c0d..14280d35 100644 --- a/R/data.R +++ b/R/data.R @@ -33,7 +33,7 @@ #' @format A data frame containing peptide level data from a Spectronaut report. #' @source Piazza, I., Beaton, N., Bruderer, R. et al. A machine learning-based chemoproteomic #' approach to identify drug targets and binding sites in complex proteomes. Nat Commun 11, 4200 -#' (2020). https://doi.org/10.1038/s41467-020-18071-x +#' (2020). \doi{10.1038/s41467-020-18071-x} "rapamycin_10uM" #' Rapamycin dose response example data @@ -47,13 +47,13 @@ #' @format A data frame containing peptide level data from a Spectronaut report. #' @source Piazza, I., Beaton, N., Bruderer, R. et al. A machine learning-based chemoproteomic #' approach to identify drug targets and binding sites in complex proteomes. Nat Commun 11, 4200 -#' (2020). https://doi.org/10.1038/s41467-020-18071-x +#' (2020). \doi{10.1038/s41467-020-18071-x} "rapamycin_dose_response" #' Structural analysis example data #' #' Example data used for the vignette about structural analysis. The data was obtained from -#' \href{https://www.sciencedirect.com/science/article/pii/S0092867420316913}{Cappelletti 2021} +#' Cappelletti et al. 2021 (\doi{10.1016/j.cell.2020.12.021}) #' and corresponds to two separate experiments. Both experiments were limited proteolyis coupled to #' mass spectrometry (LiP-MS) experiments conducted on purified proteins. The first protein is #' phosphoglycerate kinase 1 (pgk) and it was treated with 25mM 3-phosphoglyceric acid (3PG). @@ -69,7 +69,7 @@ #' @source Cappelletti V, Hauser T, Piazza I, Pepelnjak M, Malinovska L, Fuhrer T, Li Y, Dörig C, #' Boersema P, Gillet L, Grossbach J, Dugourd A, Saez-Rodriguez J, Beyer A, Zamboni N, Caflisch A, #' de Souza N, Picotti P. Dynamic 3D proteomes reveal protein functional alterations at high -#' resolution in situ. Cell. 2021 Jan 21;184(2):545-559.e22. doi: 10.1016/j.cell.2020.12.021. +#' resolution in situ. Cell. 2021 Jan 21;184(2):545-559.e22. \doi{10.1016/j.cell.2020.12.021}. #' Epub 2020 Dec 23. PMID: 33357446; PMCID: PMC7836100. "ptsi_pgk" diff --git a/R/fetch_eco.R b/R/fetch_eco.R index a2e6f994..e5da4ada 100644 --- a/R/fetch_eco.R +++ b/R/fetch_eco.R @@ -18,8 +18,7 @@ #' essential to navigating the ever-growing (in size and complexity) corpus of scientific #' information." #' -#' More information can be found in their -#' \href{https://academic.oup.com/nar/article/47/D1/D1186/5165344?login=true}{publication}. +#' More information can be found in their publication (\doi{10.1093/nar/gky1036}). #' #' @param return_relation a logical value that indicates if relational information should be returned instead #' the main descriptive information. This data can be used to check the relations of ECO terms to each other. diff --git a/R/fetch_mobidb.R b/R/fetch_mobidb.R index 7d6f1014..4e51de4e 100644 --- a/R/fetch_mobidb.R +++ b/R/fetch_mobidb.R @@ -17,7 +17,7 @@ #' @return A data frame that contains start and end positions for disordered and flexible protein #' regions. The \code{feature} column contains information on the source of this #' annotation. More information on the source can be found -#' \href{https://mobidb.bio.unipd.it/about/mobidb}{here}. +#' \href{https://mobidb.org/about/mobidb}{here}. #' @import progress #' @importFrom rlang .data #' @importFrom purrr map_dfr keep diff --git a/R/try_query.R b/R/try_query.R index 8016a00b..2199c32b 100644 --- a/R/try_query.R +++ b/R/try_query.R @@ -13,7 +13,6 @@ #' @param type a character value that specifies the type of data at the target URL. Options are #' all options that can be supplied to httr::content, these include e.g. #' "text/tab-separated-values", "application/json" and "txt/csv". Default is "text/tab-separated-values". -#' Default is "tab-separated-values". #' @param timeout a numeric value that specifies the maximum request time. Default is 60 seconds. #' @param accept a character value that specifies the type of data that should be sent by the API if #' it uses content negotiation. The default is NULL and it should only be set for APIs that use @@ -22,6 +21,7 @@ #' #' @importFrom curl has_internet #' @importFrom httr GET timeout http_error message_for_status http_status content accept +#' @importFrom readr read_tsv read_csv #' #' @return A data frame that contains the table from the url. try_query <- @@ -77,18 +77,56 @@ try_query <- return(invisible("No internet connection")) } - if (httr::http_error(query_result)) { + # If response was an error return that error message + if (inherits(query_result, "response") && httr::http_error(query_result)) { if (!silent) httr::message_for_status(query_result) return(invisible(httr::http_status(query_result)$message)) } + # Handle other types of errors separately from query errors + if (inherits(query_result, "character")) { + if (!silent) message(query_result) + return(invisible(query_result)) + } + # Record readr progress variable to set back later readr_show_progress <- getOption("readr.show_progress") on.exit(options(readr.show_progress = readr_show_progress)) # Change variable to not show progress if readr is used options(readr.show_progress = FALSE) - result <- suppressMessages(httr::content(query_result, type = type, encoding = "UTF-8", ...)) + # Retrieve the content as raw bytes using httr::content + raw_content <- httr::content(query_result, type = "raw") + # Check for gzip magic number (1f 8b) before decompression + compressed <- length(raw_content) >= 2 && raw_content[1] == as.raw(0x1f) && raw_content[2] == as.raw(0x8b) + + # Check if the content is gzip compressed + if (!is.null(query_result$headers[["content-encoding"]]) && query_result$headers[["content-encoding"]] == "gzip" && compressed) { + # Decompress the raw content using base R's `memDecompress` + decompressed_content <- memDecompress(raw_content, type = "gzip") + + # Convert the raw bytes to a character string + text_content <- rawToChar(decompressed_content) + + # Read the decompressed content based on the specified type + if (type == "text/tab-separated-values") { + result <- readr::read_tsv(text_content, ...) + } else if (type == "text/html") { + result <- xml2::read_html(text_content, ...) + } else if (type == "text/xml") { + result <- xml2::read_xml(text_content, ...) + } else if (type == "text/csv" || type == "txt/csv") { + result <- readr::read_csv(text_content, ...) + } else if (type == "application/json") { + result <- jsonlite::fromJSON(text_content, ...) # Using jsonlite for JSON parsing + } else if (type == "text") { + result <- text_content # Return raw text as-is + } else { + stop("Unsupported content type: ", type) + } + } else { + result <- suppressMessages(httr::content(query_result, type = type, encoding = "UTF-8", ...)) + } return(result) } diff --git a/README.Rmd b/README.Rmd index 47dab585..3ac8c90a 100644 --- a/README.Rmd +++ b/README.Rmd @@ -26,7 +26,7 @@ knitr::opts_chunk$set( The goal of **protti** is to provide flexible functions and workflows for proteomics quality control and data analysis, within a single, user-friendly package. It can be used for label-free DDA, DIA and SRM data generated with search tools and software such as Spectronaut, MaxQuant, Proteome Discoverer and Skyline. Both limited proteolysis mass spectrometry (LiP-MS) and regular bottom-up proteomics experiments can be analysed. -**protti** is developed and maintained by members of the lab of Paola Picotti at ETH Zurich. Our lab is focused on protein structural changes that occur in response to perturbations such as metabolite, drug and protein binding-events, as well as protein aggregation and enzyme activation ([Piazza 2018](https://www.sciencedirect.com/science/article/pii/S0092867417314484), [Piazza 2020](https://www.nature.com/articles/s41467-020-18071-x#additional-information), [Cappelletti, Hauser & Piazza 2021](https://www.sciencedirect.com/science/article/pii/S0092867420316913)). We have devoloped mass spectrometry-based structural and chemical proteomic methods aimed at monitoring protein conformational changes in the complex cellular milieu ([Feng 2014](https://www.nature.com/articles/nbt.2999)). +**protti** is developed and maintained by members of the lab of Paola Picotti at ETH Zurich. Our lab is focused on protein structural changes that occur in response to perturbations such as metabolite, drug and protein binding-events, as well as protein aggregation and enzyme activation ([Piazza 2018](https://doi.org/10.1016/j.cell.2017.12.006), [Piazza 2020](https://doi.org/10.1038/s41467-020-18071-x), [Cappelletti, Hauser & Piazza 2021](https://doi.org/10.1016/j.cell.2020.12.021)). We have devoloped mass spectrometry-based structural and chemical proteomic methods aimed at monitoring protein conformational changes in the complex cellular milieu ([Feng 2014](https://doi.org/10.1038/nbt.2999)). There is a wide range of functions **protti** provides to the user. The main areas of application are: diff --git a/README.md b/README.md index fb9cbde5..668df92e 100644 --- a/README.md +++ b/README.md @@ -27,16 +27,12 @@ be analysed. Picotti at ETH Zurich. Our lab is focused on protein structural changes that occur in response to perturbations such as metabolite, drug and protein binding-events, as well as protein aggregation and enzyme -activation ([Piazza -2018](https://www.sciencedirect.com/science/article/pii/S0092867417314484), -[Piazza -2020](https://www.nature.com/articles/s41467-020-18071-x#additional-information), -[Cappelletti, Hauser & Piazza -2021](https://www.sciencedirect.com/science/article/pii/S0092867420316913)). -We have devoloped mass spectrometry-based structural and chemical -proteomic methods aimed at monitoring protein conformational changes in -the complex cellular milieu ([Feng -2014](https://www.nature.com/articles/nbt.2999)). +activation ([Piazza 2018](https://doi.org/10.1016/j.cell.2017.12.006), +[Piazza 2020](https://doi.org/10.1038/s41467-020-18071-x), [Cappelletti, +Hauser & Piazza 2021](https://doi.org/10.1016/j.cell.2020.12.021)). We +have devoloped mass spectrometry-based structural and chemical proteomic +methods aimed at monitoring protein conformational changes in the +complex cellular milieu ([Feng 2014](https://doi.org/10.1038/nbt.2999)). There is a wide range of functions **protti** provides to the user. The main areas of application are: @@ -201,15 +197,17 @@ protein intensities. set.seed(42) # Makes example reproducible # Create synthetic data -data <- create_synthetic_data(n_proteins = 100, - frac_change = 0.05, - n_replicates = 4, - n_conditions = 2, - method = "effect_random", - additional_metadata = FALSE) - -# The method "effect_random" as opposed to "dose-response" just randomly samples -# the extend of the change of significantly changing peptides for each condition. +data <- create_synthetic_data( + n_proteins = 100, + frac_change = 0.05, + n_replicates = 4, + n_conditions = 2, + method = "effect_random", + additional_metadata = FALSE +) + +# The method "effect_random" as opposed to "dose-response" just randomly samples +# the extend of the change of significantly changing peptides for each condition. # They do not follow any trend and can go in any direction. ``` @@ -252,10 +250,12 @@ contains the normalised intensities. normalise it another time.* ``` r -normalised_data <- data %>% - normalise(sample = sample, - intensity_log2 = peptide_intensity_missing, - method = "median") +normalised_data <- data %>% + normalise( + sample = sample, + intensity_log2 = peptide_intensity_missing, + method = "median" + ) ``` #### Assign Missingness @@ -284,16 +284,18 @@ thresholds if you want to be more or less conservative with how many data points to retain. ``` r -data_missing <- normalised_data %>% - assign_missingness(sample = sample, - condition = condition, - grouping = peptide, - intensity = normalised_intensity_log2, - ref_condition = "condition_1", - retain_columns = c(protein, change_peptide)) - -# Next to the columns it generates, assign_missingness only contains the columns -# you provide as input in its output. If you want to retain additional columns you +data_missing <- normalised_data %>% + assign_missingness( + sample = sample, + condition = condition, + grouping = peptide, + intensity = normalised_intensity_log2, + ref_condition = "condition_1", + retain_columns = c(protein, change_peptide) + ) + +# Next to the columns it generates, assign_missingness only contains the columns +# you provide as input in its output. If you want to retain additional columns you # can provide them in the retain_columns argument. ``` @@ -317,16 +319,18 @@ missingness cutoffs also in order to define which comparisons are too incomplete to be trustworthy even if significant. ``` r -result <- data_missing %>% - calculate_diff_abundance(sample = sample, - condition = condition, - grouping = peptide, - intensity_log2 = normalised_intensity_log2, - missingness = missingness, - comparison = comparison, - filter_NA_missingness = TRUE, - method = "moderated_t-test", - retain_columns = c(protein, change_peptide)) +result <- data_missing %>% + calculate_diff_abundance( + sample = sample, + condition = condition, + grouping = peptide, + intensity_log2 = normalised_intensity_log2, + missingness = missingness, + comparison = comparison, + filter_NA_missingness = TRUE, + method = "moderated_t-test", + retain_columns = c(protein, change_peptide) + ) ``` Next we can use a Volcano plot to visualize significantly changing @@ -335,15 +339,17 @@ interactive plot with the `interactive` argument. Please note that this is not recommended for large datasets. ``` r -result %>% - volcano_plot(grouping = peptide, - log2FC = diff, - significance = pval, - method = "target", - target_column = change_peptide, - target = TRUE, - legend_label = "Ground Truth", - significance_cutoff = c(0.05, "adj_pval")) +result %>% + volcano_plot( + grouping = peptide, + log2FC = diff, + significance = pval, + method = "target", + target_column = change_peptide, + target = TRUE, + legend_label = "Ground Truth", + significance_cutoff = c(0.05, "adj_pval") + ) ``` diff --git a/cran-comments.md b/cran-comments.md index 9356edf4..d337fcd4 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,9 +1,7 @@ ## Submission * We specifically addressed and fixed the issue raised by Prof. Brian Ripley: - * The `analyse_functional_network()` function did not fail gracefully. - * We implemented a `try_catch()` that specifically rescues the cases in which the `STRINGdb` package does not fail gracefully. This fixes the issue. -* Additionally we added new features and fixed bugs. + * We updated `try_query()` to also handle request unrelated errors successfully. ## Test environments * macOS-latest (on GitHub actions), R 4.4.1 diff --git a/man/calculate_protein_abundance.Rd b/man/calculate_protein_abundance.Rd index 7fac88ed..cecfc79d 100644 --- a/man/calculate_protein_abundance.Rd +++ b/man/calculate_protein_abundance.Rd @@ -43,8 +43,7 @@ calculated. Possible options include \code{"sum"}, which takes the sum of all pr intensities as the protein abundance. Another option is \code{"iq"}, which performs protein quantification based on a maximal peptide ratio extraction algorithm that is adapted from the MaxLFQ algorithm of the MaxQuant software. Functions from the -\href{https://academic.oup.com/bioinformatics/article/36/8/2611/5697917}{\code{iq}} package are -used. Default is \code{"iq"}.} +\code{iq} package (\doi{10.1093/bioinformatics/btz961}) are used. Default is \code{"iq"}.} \item{for_plot}{a logical value indicating whether the result should be only protein intensities or protein intensities together with precursor intensities that can be used for plotting using diff --git a/man/fetch_eco.Rd b/man/fetch_eco.Rd index 7e24dc50..342073a4 100644 --- a/man/fetch_eco.Rd +++ b/man/fetch_eco.Rd @@ -47,8 +47,7 @@ retreive, share, and compare data associated with that evidence using computers, essential to navigating the ever-growing (in size and complexity) corpus of scientific information." -More information can be found in their -\href{https://academic.oup.com/nar/article/47/D1/D1186/5165344?login=true}{publication}. +More information can be found in their publication (\doi{10.1093/nar/gky1036}). } \examples{ \donttest{ diff --git a/man/fetch_mobidb.Rd b/man/fetch_mobidb.Rd index eaa0ff10..accf3e0f 100644 --- a/man/fetch_mobidb.Rd +++ b/man/fetch_mobidb.Rd @@ -33,7 +33,7 @@ the data in case an error occurs. The default is 2.} A data frame that contains start and end positions for disordered and flexible protein regions. The \code{feature} column contains information on the source of this annotation. More information on the source can be found -\href{https://mobidb.bio.unipd.it/about/mobidb}{here}. +\href{https://mobidb.org/about/mobidb}{here}. } \description{ Fetches information about disordered and flexible protein regions from MobiDB. diff --git a/man/figures/README-volcano-1.png b/man/figures/README-volcano-1.png index a751d919..78db269c 100644 Binary files a/man/figures/README-volcano-1.png and b/man/figures/README-volcano-1.png differ diff --git a/man/ptsi_pgk.Rd b/man/ptsi_pgk.Rd index f241df1d..696b3c2c 100644 --- a/man/ptsi_pgk.Rd +++ b/man/ptsi_pgk.Rd @@ -12,7 +12,7 @@ peptides/precursors of two proteins. Cappelletti V, Hauser T, Piazza I, Pepelnjak M, Malinovska L, Fuhrer T, Li Y, Dörig C, Boersema P, Gillet L, Grossbach J, Dugourd A, Saez-Rodriguez J, Beyer A, Zamboni N, Caflisch A, de Souza N, Picotti P. Dynamic 3D proteomes reveal protein functional alterations at high -resolution in situ. Cell. 2021 Jan 21;184(2):545-559.e22. doi: 10.1016/j.cell.2020.12.021. +resolution in situ. Cell. 2021 Jan 21;184(2):545-559.e22. \doi{10.1016/j.cell.2020.12.021}. Epub 2020 Dec 23. PMID: 33357446; PMCID: PMC7836100. } \usage{ @@ -20,7 +20,7 @@ ptsi_pgk } \description{ Example data used for the vignette about structural analysis. The data was obtained from -\href{https://www.sciencedirect.com/science/article/pii/S0092867420316913}{Cappelletti 2021} +Cappelletti et al. 2021 (\doi{10.1016/j.cell.2020.12.021}) and corresponds to two separate experiments. Both experiments were limited proteolyis coupled to mass spectrometry (LiP-MS) experiments conducted on purified proteins. The first protein is phosphoglycerate kinase 1 (pgk) and it was treated with 25mM 3-phosphoglyceric acid (3PG). diff --git a/man/rapamycin_10uM.Rd b/man/rapamycin_10uM.Rd index 0f90e87c..545d936a 100644 --- a/man/rapamycin_10uM.Rd +++ b/man/rapamycin_10uM.Rd @@ -10,7 +10,7 @@ A data frame containing peptide level data from a Spectronaut report. \source{ Piazza, I., Beaton, N., Bruderer, R. et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nat Commun 11, 4200 -(2020). https://doi.org/10.1038/s41467-020-18071-x +(2020). \doi{10.1038/s41467-020-18071-x} } \usage{ rapamycin_10uM diff --git a/man/rapamycin_dose_response.Rd b/man/rapamycin_dose_response.Rd index 9ea683a4..f777d8cf 100644 --- a/man/rapamycin_dose_response.Rd +++ b/man/rapamycin_dose_response.Rd @@ -10,7 +10,7 @@ A data frame containing peptide level data from a Spectronaut report. \source{ Piazza, I., Beaton, N., Bruderer, R. et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nat Commun 11, 4200 -(2020). https://doi.org/10.1038/s41467-020-18071-x +(2020). \doi{10.1038/s41467-020-18071-x} } \usage{ rapamycin_dose_response diff --git a/man/try_query.Rd b/man/try_query.Rd index 90d61467..cb9a64aa 100644 --- a/man/try_query.Rd +++ b/man/try_query.Rd @@ -26,8 +26,7 @@ that failed.} \item{type}{a character value that specifies the type of data at the target URL. Options are all options that can be supplied to httr::content, these include e.g. -"text/tab-separated-values", "application/json" and "txt/csv". Default is "text/tab-separated-values". -Default is "tab-separated-values".} +"text/tab-separated-values", "application/json" and "txt/csv". Default is "text/tab-separated-values".} \item{timeout}{a numeric value that specifies the maximum request time. Default is 60 seconds.} diff --git a/tests/testthat/test-fetch_extract_and_enrichment_functions.R b/tests/testthat/test-fetch_extract_and_enrichment_functions.R index 6f420b21..d24ea4f3 100644 --- a/tests/testthat/test-fetch_extract_and_enrichment_functions.R +++ b/tests/testthat/test-fetch_extract_and_enrichment_functions.R @@ -21,7 +21,7 @@ if (Sys.getenv("TEST_PROTTI") == "true") { unis <- c("iRT", "P25437", "P30870", "P0A6P9") expect_warning(mobidb <- fetch_mobidb(unis)) expect_is(mobidb, "data.frame") - expect_equal(nrow(mobidb), 259) + expect_equal(nrow(mobidb), 221) expect_equal(ncol(mobidb), 6) }) diff --git a/vignettes/data_analysis_dose_response_workflow.Rmd b/vignettes/data_analysis_dose_response_workflow.Rmd index d8481d62..484cbffe 100644 --- a/vignettes/data_analysis_dose_response_workflow.Rmd +++ b/vignettes/data_analysis_dose_response_workflow.Rmd @@ -40,9 +40,9 @@ For help with data input please click [here](https://jpquast.github.io/protti/ar A typical dose-response experiment contains multiple samples that were treated with different amounts of e.g. a drug. Replicates of samples treated with the same dose make up a condition. Commonly, the first concentration is 0 (i.e. the control, in which treatment is with the solvent of the drug). Dose-response treatments require a minimal number of treatments to fit curves with sufficient quality. For analysis with **protti** at least 5 different conditions should be present. Another consideration for your experiment is the range of concentrations. They should not be too close together since effects are usually best identified over a larger concentration range. But you should make sure not to space them out too far. It is generally advised to space them evenly on a logarithmic scale of the base 10 or $e$ (Euler's number). You can also include steps in between e.g. 100, 500, 1000 or 100, 200, 1000 for a log10 scale. It is advisable to use a rather broad concentration range for an experiment in which you do not know what to expect. -**protti** fits four-parameter log-logistic dose-response models to your data. It utilizes the `drm()` and `LL.4()` functions from the [`drc`](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146021) package for this. You can also select a non logarithmic model using the `L.4()` function in case your data does not follow a log-logistic but a logistic regression. +**protti** fits four-parameter log-logistic dose-response models to your data. It utilizes the `drm()` and `LL.4()` functions from the [`drc`](https://doi.org/10.1371/journal.pone.0146021) package for this. You can also select a non logarithmic model using the `L.4()` function in case your data does not follow a log-logistic but a logistic regression. -For limited proteolysis-coupled to mass spectrometry ([LiP-MS](https://www.nature.com/articles/nbt.2999)) data, dose-response curves have been used previously for the identification of drug binding sites in complex proteomes ([Piazza 2020](https://www.nature.com/articles/s41467-020-18071-x)). Since LiP-MS data is analysed on the peptide or precursor* level, using additional information about peptide behaviour from multiple conditions reduces false discovery rate. +For limited proteolysis-coupled to mass spectrometry ([LiP-MS](https://doi.org/10.1038/nbt.2999)) data, dose-response curves have been used previously for the identification of drug binding sites in complex proteomes ([Piazza 2020](https://doi.org/10.1038/s41467-020-18071-x)). Since LiP-MS data is analysed on the peptide or precursor* level, using additional information about peptide behaviour from multiple conditions reduces false discovery rate. _A peptide precursor is the actual molecular unit that was detected on the mass spectrometer. This is a peptide with one specific charge state and its modification(s)._ @@ -60,7 +60,7 @@ library(ggplot2) ## Loading data -For this vignette we use a subset of proteins from an experiment of HeLa cell lysates treated with 9 doses of rapamycin followed by LiP-MS. Rapamycin forms a complex with the FK506-binding protein (FKBP12) that binds and allosterically inhibits mTORC1 ([Sabatini 1994](https://www.sciencedirect.com/science/article/pii/0092867494905703?via%3Dihub)). Since rapamycin is known to be a highly specific drug, we expect to identify FKBP12 as one of the only interacting proteins. +For this vignette we use a subset of proteins from an experiment of HeLa cell lysates treated with 9 doses of rapamycin followed by LiP-MS. Rapamycin forms a complex with the FK506-binding protein (FKBP12) that binds and allosterically inhibits mTORC1 ([Sabatini 1994](https://doi.org/10.1016/0092-8674(94)90570-3)). Since rapamycin is known to be a highly specific drug, we expect to identify FKBP12 as one of the only interacting proteins. We included 39 random proteins and FKBP12 in this sample data set. The proteins were sampled using the seed 123. @@ -261,7 +261,7 @@ You can test your data set for gene ontology (GO) term enrichment (`calculate_go If you know which proteins bind or interact with your specific treatment, you can provide your own list of true positive hits and check if these are enriched in your significant hits by using **protti**'s `calculate_treatment_enrichment()` function. For our LiP-MS experiment using rapamycin we are probing direct interaction with proteins in contrast to functional effects. The only protein that rapamycin is known to bind to is FKBP12 and testing the significance of enrichment for a single or even a few proteins is not appropriate. However, testing for enrichment is especially useful if your treatment affects many proteins, since it can help you to reduce the complexity of your result. -The [STRING](https://academic.oup.com/nar/article/47/D1/D607/5198476) database provides a good resource for the analysis of protein interaction networks. It is often very useful to check for interactions within your significant hits. For LiP-MS data this sometimes explains why proteins that do not directly interact with your treatment are still significantly affected. With `analyse_functional_network()`, **protti** provides a useful wrapper around some [`STRINGdb`](https://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html) package functions. +The [STRING](https://doi.org/10.1093/nar/gky1131) database provides a good resource for the analysis of protein interaction networks. It is often very useful to check for interactions within your significant hits. For LiP-MS data this sometimes explains why proteins that do not directly interact with your treatment are still significantly affected. With `analyse_functional_network()`, **protti** provides a useful wrapper around some [`STRINGdb`](https://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html) package functions. ## Annotation of data diff --git a/vignettes/data_analysis_single_dose_treatment_workflow.Rmd b/vignettes/data_analysis_single_dose_treatment_workflow.Rmd index ae7813c0..8cb8a6d4 100644 --- a/vignettes/data_analysis_single_dose_treatment_workflow.Rmd +++ b/vignettes/data_analysis_single_dose_treatment_workflow.Rmd @@ -231,7 +231,7 @@ Before the statistical hypothesis test we have to define the types of missing va After assigning the types of data missingness we use the function `calculate_diff_abundance()`. By selecting `method = t-test` the function will perform a Welch's t-test. There are also options included to perform a moderated t-test based on the R package `limma` or to detect differential abundances based on the algorithm implemented in the R package `proDA`. The algorithm used for `proDA` is based on a probabilistic dropout model which facilitates hypothesis testing (using a moderated t-test) while eliminating the need for imputation. -It has been shown that generally moderated t-tests perform much better also in proteomics data, as compared to t-tests ([Kammers et al. 2015](https://www.sciencedirect.com/science/article/pii/S2212968515000069)). Therefore, we will use a moderated t-test in this example. +It has been shown that generally moderated t-tests perform much better also in proteomics data, as compared to t-tests ([Kammers et al. 2015](https://doi.org/10.1016/j.euprot.2015.02.002)). Therefore, we will use a moderated t-test in this example. Please note that in this example we are not imputing missing values. You can, however, do so by using the function `impute()`. This function uses the output of `assign_missingness()` as its input. You can use two different imputation methods: diff --git a/vignettes/input_preparation_workflow.Rmd b/vignettes/input_preparation_workflow.Rmd index 3884abbb..744047b2 100644 --- a/vignettes/input_preparation_workflow.Rmd +++ b/vignettes/input_preparation_workflow.Rmd @@ -41,7 +41,7 @@ Data should always be organised in a format called [tidy data](https://r4ds.had. ## Protein-centric analysis -Many search engines provide the user with protein intensities. However, it is also possible to calculate protein intensities directly from precursor intensities with the **protti** function `calculate_protein_abundance()`. **Protti** implements the `"iq"` method, previously implemented in the R package [`iq`](https://academic.oup.com/bioinformatics/article/36/8/2611/5697917) which performs protein quantification based on the maximal peptide ratio extraction algorithm adapted from the MaxLFQ algorithm ([Cox, J. 2013](https://www.sciencedirect.com/science/article/pii/S1535947620333107?via%3Dihub)). +Many search engines provide the user with protein intensities. However, it is also possible to calculate protein intensities directly from precursor intensities with the **protti** function `calculate_protein_abundance()`. **Protti** implements the `"iq"` method, previously implemented in the R package [`iq`](https://doi.org/10.1093/bioinformatics/btz961) which performs protein quantification based on the maximal peptide ratio extraction algorithm adapted from the MaxLFQ algorithm ([Cox, J. 2013](https://doi.org/10.1074/mcp.M113.031591)). One advantage of calculating the protein abundance with **protti** is the possibility to median normalise run intensities on the precursor level. This is closer to the actually acquired intensities and thus sample concentrations than if normalisation is performed on the protein level. Some search engines provide the option for automatic median normalisation but not all. Furthermore, some search engines calculate protein intensities by summation of precursor intensities irrespective of missingness of peptides in certain samples. In these cases the maximal peptide ratio implemented in extraction algorithm provides a more robust calculation of protein intensities.