Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
ArthurDeclercq committed Jul 17, 2024
2 parents d6e76cf + df3ceda commit 5295ce6
Show file tree
Hide file tree
Showing 27 changed files with 855 additions and 358 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Test built package
run: |
pip install dist/ms2rescore-*.whl
pip install --only-binary :all: dist/ms2rescore-*.whl
# pytest
ms2rescore --help
Expand All @@ -54,7 +54,7 @@ jobs:
- name: Install package and dependencies
run: |
python -m pip install --upgrade pip
pip install . pyinstaller
pip install --only-binary :all: . pyinstaller
- name: Install Inno Setup
uses: crazy-max/ghaction-chocolatey@v3
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,12 @@ jobs:
- name: Build and install ms2rescore package
run: |
pip install .[dev]
pip install --only-binary :all: .[dev]
- name: Test with pytest
run: |
pytest
# - name: Test with pytest
# run: |
# pytest
- name: Test installation
run: |
ms2rescore --help
Expand Down
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
FROM ubuntu:focal
FROM python:3.11

# ARG DEBIAN_FRONTEND=noninteractive

LABEL name="ms2rescore"

Expand All @@ -11,8 +13,7 @@ ADD MANIFEST.in /ms2rescore/MANIFEST.in
ADD ms2rescore /ms2rescore/ms2rescore

RUN apt-get update \
&& apt-get install --no-install-recommends -y python3-pip procps libglib2.0-0 libsm6 libxrender1 libxext6 \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install ms2rescore/
&& apt install -y procps \
&& pip install /ms2rescore --only-binary :all:

ENTRYPOINT [""]
4 changes: 4 additions & 0 deletions docs/source/config_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@
- *string*
- *null*
- **`lower_score_is_better`** *(boolean)*: Bool indicating if lower score is better. Default: `false`.
- **`max_psm_rank_input`** *(number)*: Maximum rank of PSMs to use as input for rescoring. Minimum: `1`. Default: `10`.
- **`max_psm_rank_output`** *(number)*: Maximum rank of PSMs to return after rescoring, before final FDR calculation. Minimum: `1`. Default: `1`.
- **`modification_mapping`** *(object)*: Mapping of modification labels to each replacement label. Default: `{}`.
- **`fixed_modifications`** *(object)*: Mapping of amino acids with fixed modifications to the modification name. Can contain additional properties. Default: `{}`.
- **`processes`** *(number)*: Number of parallel processes to use; -1 for all available. Minimum: `-1`. Default: `-1`.
Expand All @@ -57,6 +59,7 @@
- *string*
- *null*
- **`write_report`** *(boolean)*: Write an HTML report with various QC metrics and charts. Default: `false`.
- **`profile`** *(boolean)*: Write a txt report using cProfile for profiling. Default: `false`.
## Definitions

- <a id="definitions/feature_generator"></a>**`feature_generator`** *(object)*: Feature generator configuration. Can contain additional properties.
Expand All @@ -76,6 +79,7 @@
- **`reference_dataset`** *(string)*: Path to Ionmob reference dataset file. Default: `"Meier_unimod.parquet"`.
- **`tokenizer`** *(string)*: Path to tokenizer json file. Default: `"tokenizer.json"`.
- <a id="definitions/mokapot"></a>**`mokapot`** *(object)*: Mokapot rescoring engine configuration. Additional properties are passed to the Mokapot brew function. Can contain additional properties. Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
- **`train_fdr`** *(number)*: FDR threshold for training Mokapot. Minimum: `0`. Maximum: `1`. Default: `0.01`.
- **`write_weights`** *(boolean)*: Write Mokapot weights to a text file. Default: `false`.
- **`write_txt`** *(boolean)*: Write Mokapot results to a text file. Default: `false`.
- **`write_flashlfq`** *(boolean)*: Write Mokapot results to a FlashLFQ-compatible file. Default: `false`.
Expand Down
85 changes: 83 additions & 2 deletions docs/source/userguide/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,15 @@ be configured separately. For instance:
.. code-block:: json
"fixed_modifications": {
"C": "U:Carbamidomethyl"
"U:Carbamidomethyl": ["C"]
}
.. tab:: TOML

.. code-block:: toml
[ms2rescore.fixed_modifications]
"Carbamidomethyl" = ["C"]
"U:Carbamidomethyl" = ["C"]
.. tab:: GUI

Expand All @@ -140,6 +140,28 @@ be configured separately. For instance:
:alt: fixed modifications configuration in GUI


Fixed terminal modifications can be added by using the special labels ``N-term`` and ``C-term``.
For example, to additionally add TMT6plex to the N-terminus and lysine residues, the following
configuration can be used:

.. tab:: JSON

.. code-block:: json
"fixed_modifications": {
"U:Carbamidomethyl": ["C"],
"U:TMT6plex": ["N-term", "K"]
}
.. tab:: TOML

.. code-block:: toml
[ms2rescore.fixed_modifications]
"U:Carbamidomethyl" = ["C"]
"U:TMT6plex" = ["N-term", "K"]
.. caution::
Most search engines DO return fixed modifications as part of the modified peptide sequences.
In these cases, they must NOT be added to the ``fixed_modifications`` configuration.
Expand Down Expand Up @@ -218,6 +240,65 @@ expression pattern that extracts the decoy status from the protein name:
decoy_pattern = "DECOY_"
Multi-rank rescoring
====================

Some search engines, such as MaxQuant, report multiple candidate PSMs for the same spectrum.
MS²Rescore can rescore multiple candidate PSMs per spectrum. This allows for lower-ranking
candidate PSMs to become the top-ranked PSM after rescoring. This behavior can be controlled with
the ``max_psm_rank_input`` option.

To ensure a correct FDR control after rescoring, MS²Rescore filters out lower-ranking PSMs before
final FDR calculation and writing the output files. To allow for lower-ranking PSMs to be included
in the final output - for instance, to consider chimeric spectra - the ``max_psm_rank_output``
option can be used.

For example, to rescore the top 5 PSMs per spectrum and output the best PSM after rescoring,
the following configuration can be used:

.. tab:: JSON

.. code-block:: json
"max_psm_rank_input": 5
"max_psm_rank_output": 1
.. tab:: TOML

.. code-block:: toml
max_psm_rank_input = 5
max_psm_rank_output = 1
Configuring rescoring engines
=============================

MS²Rescore supports multiple rescoring engines, such as Mokapot and Percolator. The rescoring
engine can be selected and configured with the ``rescoring_engine`` option. For example, to use
Mokapot with a custom train_fdr of 0.1%, the following configuration can be used:

.. tab:: JSON

.. code-block:: json
"rescoring_engine": {
"mokapot": {
"train_fdr": 0.001
}
.. tab:: TOML
.. code-block:: toml
[ms2rescore.rescoring_engine.mokapot]
train_fdr = 0.001
All options for the rescoring engines can be found in the :ref:`ms2rescore.rescoring_engines`
section.
All configuration options
=========================
Expand Down
26 changes: 17 additions & 9 deletions docs/source/userguide/input-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,31 @@ Input files
PSM file(s)
===========

The peptide-spectrum match (PSM) file is generally the output from a proteomics search engine.
This file serves as the main input to MS²Rescore. One or multiple PSM files can be provided at
once. Note that merging PSMs from different MS runs could have an impact on the correctness of
the FDR control.
The **peptide-spectrum match (PSM) file** is generally the output from a proteomics search engine.
This file serves as the main input to MS²Rescore.

Various PSM file types are supported. The type can be specified with the ``psm_file_type`` option.
Check the list of :py:mod:`psm_utils` tags in the
:external+psm_utils:ref:`supported file formats <supported file formats>` section. Depending on the
file extension, the file type can also be inferred from the file name. In that case,
``psm_file_type`` option can be set to ``infer``.
The PSM file should contain **all putative identifications** made by the search engine, including
both target and decoy PSMs. Ensure that the search engine was configured to include decoy entries
in the search database and was operated with **target-decoy competition** enabled (i.e.,
considering both target and decoy sequences simultaneously during the search).

.. attention::
As a general rule, MS²Rescore always needs access to **all target and decoy PSMs, without any
FDR-filtering**. For some search engines, this means that the FDR-filter should be disabled or
set to 100%.


One or multiple PSM files can be provided at once. Note that merging PSMs from different MS runs
could have an impact on the correctness of the FDR control. Combining multiple PSM files should
generally only be done for LC-fractionated mass spectrometry runs.

Various PSM file types are supported. The type can be specified with the ``psm_file_type`` option.
Check the list of :py:mod:`psm_utils` tags in the
:external+psm_utils:ref:`supported file formats <supported file formats>` section. Depending on the
file extension, the file type can also be inferred from the file name. In that case,
``psm_file_type`` option can be set to ``infer``.


Spectrum file(s)
================

Expand Down
4 changes: 2 additions & 2 deletions docs/source/userguide/output-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ Rescoring engine files:
| ``<prefix>.<mokapot/percolator>.weights.txt`` | Feature weights, showing feature usage in the rescoring run |
+-------------------------------------------------------------+-------------------------------------------------------------+

If no rescoring engine is selected (or if Percolator was selected), the following files will also
be written:
If no rescoring engine is selected, if Percolator was selected, or in DEBUG mode, the following
files will also be written:

+-------------------------------------------------------------+-----------------------------------------------------------+
| File | Description |
Expand Down
61 changes: 61 additions & 0 deletions docs/source/userguide/tims2Rescore.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
.. _timsrescore:

TIMS²Rescore User Guide
=======================

Introduction
------------

The `TIMS²Rescore` tool is a DDA-PASEF adapted version of `ms2rescore` that allows users to perform rescoring of peptide-spectrum matches (PSMs) acquired on Bruker instruments. This guide provides an overview of how to use `timsrescore` in `ms2rescore` effectively.

Installation
------------

Before using `timsrescore`, ensure that you have `ms2rescore` installed on your system. You can install `ms2rescore` using the following command:

.. code-block:: bash
pip install ms2rescore
Usage
-----

To use `timsrescore`, follow these steps:

1. Prepare your input files:
- Ensure that you have the necessary input files, including the PSM file spectrum files
- Make sure that the PSM file format from a supported search engine or a standard format like .mzid(:external+psm_utils:ref:`supported file formats <supported file formats>`).
- Spectrum files can directly be given as .d or minitdf files from Bruker instruments or first converted to .mzML format.

2. Run `timsrescore`:
- Open a terminal or command prompt.
- Navigate to the directory where your input files are located.
- Execute the following command:

.. code-block:: bash
timsrescore -p <path_to_psm_file> -s <path_to_spectrum_file> -o <path_to_output_file>
Replace `<path_to_psm_file>`, `<path_to_tims_file>`, and `<path_to_output_file>` with the actual paths to your input and output files.
_NOTE_ By default timsTOF specific models will be used for predictions. Optionally you can further configure settings through a configuration file. For more information on configuring `timsrescore`, refer to the :doc:`configuration` tab in the user guide.

3. Review the results:
- Once the `timsrescore` process completes, you will find the rescoring results in the specified output file or if not specified in the same directory as the input files
- If you want a detailed overview of the performance, you can either give the set `write_report` to `True` in the configuration file, use the `--write_report` option in the command line or run the following command:

.. code-block:: bash
ms2rescore-report <output_prefix>
Replace `<output_prefix>` with the actual output prefix of the result files to the output file.

Additional Options
------------------

`ms2rescore` provides additional options to customize the `timsrescore` process. You can explore these options by running the following command:

.. code-block:: bash
timsrescore --help
2 changes: 1 addition & 1 deletion ms2rescore/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""MS²Rescore: Sensitive PSM rescoring with predicted MS² peak intensities and RTs."""

__version__ = "3.0.2"
__version__ = "3.1.0-dev9"

from warnings import filterwarnings

Expand Down
Loading

0 comments on commit 5295ce6

Please sign in to comment.