Skip to content

Commit

Permalink
Merge branch 'release/v2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
dbolotin committed Sep 13, 2016
2 parents db5df31 + 6e4c7b0 commit 1b5ed62
Show file tree
Hide file tree
Showing 172 changed files with 1,677 additions and 10,172 deletions.
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +0,0 @@
[submodule "milib"]
path = milib
url = https://github.com/milaboratory/milib.git
17 changes: 17 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@

MiXCR 2.0 (13 Sep 2016)
========================

-- New JSON-based reference library format (see [RepSeq.IO](https://github.com/repseqio/repseqio))
-- Complete review of V/D/J/C gene library (see [repository](https://github.com/repseqio/library))
-- New simplified method to import IMGT library (see documentation)
-- All `--loci` options replaced with `--chain` (`-l` -> `-c`)
-- Removed option `--diff-loci` at `align` step
-- Added `-OallowChimeras=true` / `false` option at `align` step (better algorithm than was with
`--diff-loci`)
-- Removed: option `-u`/`--functional-only` in `align` action
-- Many small fixes
-- minor: Improved report content with absolute values for all rows and additional version info
-- minor: Now report with run statistics is additionally printed to stdout
-- minor: Execution time information added to report


MiXCR 1.8.3 ( 8 Sep 2016)
========================

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ to upgrade already installed MiXCR to the newest version:
#### Requirements

* Any OS with Java support (Linux, Windows, Mac OS X, etc..)
* Java 1.7 or higher
* Java 1.8 or higher

## Usage

Expand Down
36 changes: 9 additions & 27 deletions doc/align.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,14 @@ The following table contains description of command line options for ``align``:
| ``-r {file}`` |br| | | Report file name. If this option is not |
| ``--report ...`` | | specified, no report file be produced. |
+-------------------------------------+----------------------------+------------------------------------------------------------+
| ``-l {loci}`` |br| | ``ALL`` | Target immunological loci list separated by "``,``". |
| ``--loci ...`` | | Available values: ``IGH``, ``IGL``, ``IGK``, ``TRA``, |
| ``-с {chain}`` |br| | ``ALL`` | Target immunological chain list separated by "``,``". |
| ``--chains ...`` | | Available values: ``IGH``, ``IGL``, ``IGK``, ``TRA``, |
| | | ``TRB``, ``TRG``, ``TRD``, ``IG`` (for all immunoglobulin |
| | | loci), ``TCR`` (for all T-cell receptor loci), ``ALL`` |
| | | (for all loci) . |
| | | chains), ``TCR`` (for all T-cell receptor chains), ``ALL`` |
| | | (for all chains) . It is highly recomended to use |
| | | the default value for this parameter in most cases |
| | | at the align step. Filltering is also possible at the |
| | | export step. |
+-------------------------------------+----------------------------+------------------------------------------------------------+
| ``-s {speciesName}`` |br| | ``HomoSapiens`` | Species (organism). Possible values: ``hsa`` (or |
| ``--species ...`` | | ``HomoSapiens``) and ``mmu`` (or ``MusMusculus``), or any |
Expand All @@ -62,7 +65,7 @@ The following table contains description of command line options for ``align``:
| ``-t {numberOfThreads}`` |br| | number of | Number of processing threads. |
| ``--threads ...`` | available CPU cores | |
+-------------------------------------+----------------------------+------------------------------------------------------------+
| ``-n {numberOfReads}`` |br| | | Limit number of sequences that will be analysed (only |
| ``-n {numberOfReads}`` |br| | | Limit number of sequences that will be analysed (only |
| ``--limit ...`` | | first ``-n`` sequences will be processed from input |
| | | file(s)). |
+-------------------------------------+----------------------------+------------------------------------------------------------+
Expand Down Expand Up @@ -290,25 +293,4 @@ These parameters can be overridden in the following way:



.. _ref-alignRNASeq:

Analysis of RNA-Seq data
------------------------

Analysis of RNA-Seq data performed with ``-p rna-seq`` option is almost equivalent to the following set of aligners parameters:

- (**most important**) turned off floating bounds of V and J alignments:

- ``-OvParameters.parameters.floatingLeftBound=false``
- ``-OjParameters.parameters.floatingRightBound=false``

- higher thresholds:

- ``-OvParameters.parameters.absoluteMinScore=80`` (was 40)
- ``-OjParameters.parameters.absoluteMinScore=70`` (was 40)
- ``-OminSumScore=200`` (was 120; see below)

- more strict scoring for all alignments (V, J, C):

- ``-OxParameters.parameters.scoring.gapPenalty=-21``
- ``-OxParameters.parameters.scoring.subsMatrix='simple(match=5,mismatch=-12)'``
.. _ref-alignRNASeq:
2 changes: 1 addition & 1 deletion doc/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ where

- ``targetFrom`` - position of first aligned nucleotide in **target
sequence** (sequence of gene feature from reference V, D, J or C
allele used in alignment; e.g. ``VRegion`` in TRBV12-2); this
gene used in alignment; e.g. ``VRegion`` in TRBV12-2); this
boundary is inclusive
- ``targetTo`` - next position after last aligned nucleotide in **target
sequence**; this boundary is exclusive
Expand Down
8 changes: 4 additions & 4 deletions doc/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ The list of command line parameters for both ``exportAlignments`` and
The line parameters are only for ``exportClones``:

+--------------------------------------+-------------------------------------------------------------------+
| ``-l``, ``--filter-locus`` | Limit output to specific locus (e.g. TRA or IGH). Clone fractions |
| ``-c``, ``--chains`` | Limit output to specific locus (e.g. TRA or IGH). Clone fractions |
| | will be recalculated accordingly. |
+--------------------------------------+-------------------------------------------------------------------+
| ``-o``, ``--filter-out-of-frames`` | Exclude out of frames (fractions will be recalculated) |
+--------------------------------------+-------------------------------------------------------------------+
| ``-t``, ``--filter-stops`` | Exclude sequences containing stop codons (fractions will be |
| | recalculated) |
+--------------------------------------+-------------------------------------------------------------------+
| ``-c``, ``--minimal-clone-count`` | Filter clones by minimal read count. |
| ``-m``, ``--minimal-clone-count`` | Filter clones by minimal read count. |
+--------------------------------------+-------------------------------------------------------------------+
| ``-q``, ``--minimal-clone-fraction`` | Filter clones by minimal clone fraction. |
+--------------------------------------+-------------------------------------------------------------------+
Expand Down Expand Up @@ -686,7 +686,7 @@ One can also export all read IDs that were aggregated by eah clone. For this one

::

mixcr exportClones -p min -readIds index_file clones.clns clones.txt
mixcr exportClones -c IGH -p min -readIds index_file clones.clns clones.txt

This will add a column with full enumeration of all reads that were absorbed by particular clone:

Expand All @@ -712,7 +712,7 @@ Finally, one can export reads aggregated by each clone into separate ``.fastq``

::

mixcr align -g -l IGH input.fastq alignments.vdjca.gz
mixcr align -g input.fastq alignments.vdjca.gz

With this option MiXCR will store original reads in the ``.vdjca`` file. Then one can export reads corresponding for particular clone with ``exportReadsForClones`` command. For example, export all reads that were assembled into the first clone (clone with cloneId = 0):

Expand Down
143 changes: 25 additions & 118 deletions doc/importSegments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,145 +4,52 @@

<br />

Importing gene segment sequences
Using external libraries for alignment
================================

.. tip::

The ``mixcr importFromIMGT`` command is the simplest way to import reference segment sequences from IMGT. (*see documnetation below*)
MiXCR utilases libraries in .json format (see https://github.com/repseqio for details).

NOTICE. In some cases when using an external library mixcr will try to establish connection with NCBI over the internet.

.. _ref-auto-imgt:

Automated import of reference sequences from IMGT
IMGT library
-------------------------------------------------
Compiled IMGT library file for MiXCR can be downloaded at https://github.com/repseqio/library-imgt/releases. In order to use the library put the .json library file to ``~/.mixcr/libraries``, to the directory from where mixcr is started or to ``libraries/`` subfolder of mixcr installation folder.

To simplify import of IMGT reference sequences we developed an interactive bash script that will automatically download and import all possible reference sequences for a selected species.

The sctipt can be invoked using ``mixcr importFromIMGT`` command, or can be found in the root folder of MiXCR distribution zip file (``importFromIMGT.sh``).

Script has the following dependacies:

- wget
- pup (see installation instractions here_)

.. _here: https://github.com/EricChiang/pup#install

To use the script, just execute it from any folder to where you have a write access:

::
mixcr importFromIMGT

or execute it directly

::
/path/to/unzipped/mixcr/importIMGT.sh

It will ask you to accept the copyright rules of IMGT website, to select a species and to provide it's common names. After doing this, script will automatically download all required files from IMGT website and import them to a local loci library.

During execution script will create log files for each type of imported segment. See below for example log file.

After import reference sequences can be used as follows:

::
mixcr align --library local -s macaca ....


Import of V, D and J gene sequences from a file
-----------------------------------------------

If you need to analyse data from species that are not covered by MiXCR built-it reference V, D, J genes library, or you just want to use alternative reference library, you can convert specially formatted fasta files to MiXCR loci-library format by using ``importSegments`` action.

Here is the examaple command:

::

mixcr importSegments -p imgt -v human_TRBV.fasta -j human_TRBJ.fasta \
-d human_TRBD.fasta -l TRB -s 9606:hs -r report.txt

This command will import IMGT formatted fasta files (like those that can be downloade on this_ page) and import it to a local loci library file (stored in ``~/.mixcr/local.ll``).

.. _this: http://www.imgt.org/vquest/refseqh.html

Command line parameters
^^^^^^^^^^^^^^^^^^^^^^^

Here is the list of command line parameters for ``importSegments`` action:

+------------------------------------+-------------------------------------------------------------------+
| Option | Description |
+====================================+===================================================================+
| ``-p {params}`` |br| | select the parameters of import. Parameters determine how to |
| ``--parameters {params}`` | parse fasta headers and how to extract information about anchor |
| | points (e.g. using specific positions in sequences with IMGT gaps |
| | or searching for a specific patterns in gene seqeuence). |
| | |br| |br| currently, the only possible value is ``imgt`` |
+------------------------------------+-------------------------------------------------------------------+
| ``-v {file}`` | specify fasta-formatted file with sequences ov V genes |
+------------------------------------+-------------------------------------------------------------------+
| ``-d {file}`` | specify fasta-formatted file with sequences ov D genes |
+------------------------------------+-------------------------------------------------------------------+
| ``-j {file}`` | specify fasta-formatted file with sequences ov J genes |
+------------------------------------+-------------------------------------------------------------------+
| ``-l {locus}`` |br| | determines which immunological locus data is being imported |
| ``--locus {locus}`` | |br| |br| |
| | possible values: ``TRA``, ``TRB``, ``TRG``, ``TRD``, |
| | ``IGH``, ``IGL``, ``IGK`` |
+------------------------------------+-------------------------------------------------------------------+
| ``-s {taxonID:commName1:..}`` |br| | specify NCBI Taxonomy ID (e.g. 9606 for human) and a list of |
| ``--species {...}`` | common species names for organism to be imported |br| |br| |
| | example: ``9606:hs:hsa:human:homsap`` |
+------------------------------------+-------------------------------------------------------------------+
| ``-r {reportFile}`` |br| | specify report file. |br| Report contains comprehancive error and |
| ``--report {reportFile}`` | warning log of importing procedure and amino-acid and nucleotide |
| | alignments of allelic variants imported from file, along with |
| | information ot infered positions of anchor points for all |
| | imported genes (see below) |
+------------------------------------+-------------------------------------------------------------------+
| ``-f`` | force overwrite already existing locus records in the output file |
+------------------------------------+-------------------------------------------------------------------+
.. tip::

Use ``mixcr -v`` to see what folders mixcr uses to look for library .json file in.

Report file
^^^^^^^^^^^
.. code-block:: console
It is very important to manually check results of importing, as this process involves several empirical steps like search of an anchor points using patterns in the sequence. MiXCR produces comprehansive report file with errors and warnings arised during importing and well-formatted nucleotide and amino acid alignments of allelic variants of V, D and J genes which are marked up with anchor points, so any mistakes can be easily detected.
> mixcr -v
Here is the example report file record:
...
.. raw:: html
Library search path:
- built-in libraries
- /home/username/.
- /home/username/.mixcr/libraries
- /software/mixcr/libraries
<pre style="font-size: 10px">
TRBV4-1
=======
&lt;FR1 FR1&gt;&lt;C
TRBV4-1*01 [F] 0 GACACTGAAGTTACCCAGACACCAAAACACCTGGTCATGGGAATGACAAATAAGAAGTCTTTGAAATGTGAACAACATAT 79
TRBV4-1*02 [F] 0 .. 1
.. code-block:: console
DR1 CDR1&gt;&lt;FR2 FR2&gt;&lt;CDR2 CDR
TRBV4-1*01 [F] 80 GGGGCACAGGGCTATGTATTGGTACAAGCAGAAAGCTAAGAAGCCACCGGAGCTCATGTTTGTCTACAGCTATGAGAAAC 159
TRBV4-1*02 [F] 2 ............A................................................................... 81
> mixcr align --library imgt input_R1.fastq input_R2.fastq alignments.vdjca
2&gt;&lt;FR3
TRBV4-1*01 [F] 160 TCTCTATAAATGAAAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCTCTTAAACCTTCACCTACACGCC 239
TRBV4-1*02 [F] 82 ................................................................................ 161
... Building alignments
FR3&gt;&lt;CDR3 V&gt;
TRBV4-1*01 [F] 240 CTGCAGCCAGAAGACTCAGCCCTGTATCTCTGCGCCAGCAGCCAAGA 286
TRBV4-1*02 [F] 162 ..............................................- 207
``--library`` option specifies the library to use for alignment. If the short name is given (ex.``--library imgt``) mixcr will look for the latest version in the folder. Otherwise, to use one of the old versions give the full name including the version number (ex. ``-library imgt.201631-4`` )

.. code-block:: console
**********
> mixcr assemble alignments.vdjca clones.clns
&lt;FR1 FR1&gt;CDR1&gt;&lt;FR2 FR2&gt;&lt;CDR2&gt;&lt;FR3
TRBV4-1*01 [F] 0 DTEVTQTPKHLVMGMTNKKSLKCEQHMGHRAMYWYKQKAKKPPELMFVYSYEKLSINESVPSRFSPECPNSSLLNLHLHA 79
TRBV4-1*02 [F] 0 ...................................................... 53
... Assembling clones
FR3&gt;&lt;CDR3
TRBV4-1*01 [F] 80 LQPEDSALYLCASSQ_ 95
TRBV4-1*02 [F] 54 ................ 69
> mixcr exportClones --chains IGH clones.clns clones.txt
</pre>
... Exporting clones to tab-delimited file
2 changes: 1 addition & 1 deletion doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ To install MiXCR using Homebrew just type the following commands:
Installation on Mac OS X / Linux / FreeBSD from zip distribution
----------------------------------------------------------------

- Check that you have Java **1.7+** installed on your system by typing ``java -version``. Here is the example output of this command:
- Check that you have Java **1.8+** installed on your system by typing ``java -version``. Here is the example output of this command:

.. code-block:: console
Expand Down
Loading

0 comments on commit 1b5ed62

Please sign in to comment.