-
Notifications
You must be signed in to change notification settings - Fork 5
Amplicon schemes
Viridian workflow can use its built-in schemes, you can provide your own scheme, or you can mix and match between those two options.
When the pipeline runs, it automatically discovers which amplicon scheme
the reads match best to. You can force the choice if you like, using
the option --force_amp_scheme NAME
, where NAME
must exactly match
one of the scheme names.
Viridian has built-in SARS-CoV-2 amplicon schemes. At the time of writing, these are: ARTIC version 3, ARTIC version 4, and Midnight-1200.
You can find out the current built-in schemes by looking at the help
message from viridian_workflow run_one_sample --help
. You should see
this amongst the output:
--built_in_amp_schemes scheme1,scheme2,...
Comma-separated list of built in amplicon schemes to use [COVID-ARTIC-V3,COVID-ARTIC-V4,COVID-MIDNIGHT-1200]
By default, all built-in schemes are used, and are listed as the default option
in square brackets. In this case, they are:
COVID-ARTIC-V3,COVID-ARTIC-V4,COVID-MIDNIGHT-1200
.
You can restrict to using just some (or even none - see how later) of the built-in schemes.
For example, to just use ARTIC versions 3 and 4, add this
when running the workflow: --built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4
.
Viridian needs the primer scheme in one of two file formats:
- Its own custom TAB-delimited file, described below
- The "PrimalScheme" BED format, see for example this SARS-CoV-2/400/v5.3.2 BED file.
If the filename ends with .bed
then Viridian assumes it is in the second format.
Otherwise it assumes it is in the first format.
An amplicon scheme needs to be defined in a TAB-delimited file. That file has one primer per line, and must include the following column headings (any other columns are simply ignored):
-
Amplicon_name
: the name of the amplicon -
Primer_name
: the name of the primer -
Left_or_right
: must be "left" or "right", indicating if this is the left or right primer for the amplicon. -
Sequence
: the nucleotide sequence of the primer. If it is a left primer, then must be on the forward strand of the reference genome. If it is a right primer, then it must be on the reverse strand of the reference genome. -
Position
: zero-based position of the start of the primer when aligned to the reference genome (ie what you would get in a SAM/BAM file). In other words, it should be min(start in ref, end in ref), however you consider the various orientations of the primer and whether or not it needs reverse complementing.
As an example, here are the first four lines of the ARTIC version 3 file that is built in to the pipeline:
Amplicon_name Primer_name Left_or_right Sequence Position
nCoV-2019_1_pool1 nCoV-2019_1_LEFT left ACCAACCAACTTTCGATCTCTTGT 30
nCoV-2019_1_pool1 nCoV-2019_1_RIGHT right CATCTTTAAGATGTTGACGTGCCTC 385
nCoV-2019_2_pool2 nCoV-2019_2_LEFT left CTGTTTTACAGGTTCGCGACGT 320
nCoV-2019_2_pool2 nCoV-2019_2_RIGHT right TAAGGATCAGTGCCAAGCTCGT 704
It is important to note that we assume that left primer sequences match the forward strand of the reference, and right primer sequences match the reverse strand of the genome. Here is a diagram of the first primer:
ref genome: ...ACCAACCAACTTTCGATCTCTTGT...
|
ref position: 30
and the second primer:
[ rev comp of primer seq]
ref genome: ...GAGGCACGTCAACATCTTAAAGATG...
|
ref position: 385
ie the right hand primer sequence in the TSV file must be reverse complemented to then match the forward strand of the reference.
WARNING: currently, the TSV file is not checked for correctness (this will change in the future). It is up to you to make a sensible TSV file.
Custom schemes must be provided to the workflow using a TAB-delimited file
listing the name of the scheme and the absolute path to the
file of that scheme (in the TAB-delimited format described above).
It must have two columns: Name
and File
.
Even if you are only using one custom scheme, this file is required.
For example:
Name File
My_scheme /path/to/scheme.tsv
This file is then provided to the pipeline using the option
--amp_schemes_tsv
. See the examples below.
There are various cases - we will give an example of each one. They are
controlled by the three options --built_in_amp_schemes
, --amp_schemes_tsv
and --force_amp_scheme
.
Simply use no extra options. The built-in schemes will be used, and the one that best agrees with the reads will be chosen.
Use the option --force_amp_scheme
. The value given must exactly match
one of the built-in names. For example:
--force_amp_scheme COVID-ARTIC-V3
.
Use the option --built_in_amp_schemes
to list only the ones you
want to use. For example for ARTIC version 3 and 4 only:
--built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4
. The value
must be a comma-separated list of the scheme names.
Use the option --amp_schemes_tsv schemes.tsv
. This will use the schemes
listed in schemes.tsv
only. Using this option on its own disables the
use of built-in schemes.
Use the option --amp_schemes_tsv schemes.tsv
, and the option
--force_amp_scheme scheme1
. Note that whatever value you use
to force the scheme choice (in this case scheme1
), that must be
a name of a scheme in schemes.tsv
.
Use the option --amp_schemes_tsv schemes.tsv
to provide your
own scheme(s). Additionally, list all of the built-in schemes
you would also like to use like this:
--built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4
.
Additionally, you can still force the choice of scheme with
the option --force_amp_scheme NAME
, as long as NAME
is
the name of one of your schemes or the built-in schemes.