-
Notifications
You must be signed in to change notification settings - Fork 2
2. Software Usage
We recommande using KaMRaT within apptainer
(previous singularity
) container:
apptainer exec -B /bind_src:/bind_des kamrat <CMD> [options] /path/from/{bind_des}/to/input/kmer/table
# <CMD> can be one of index, filter, mask, merge, score, query
# replace "apptainer" to "singularity" when KaMRaT is built by singularity
The -B
option is to bind disk partitions to apptainer image, please check apptainer
helper for details:
apptainer exec -h
If built from source, KaMRaT can be run by:
/path/to/KaMRaT/kamrat/bin/in/app/directory <CMD> [options] /path/to/input/kmer/table
# <CMD> can be one of index, filter, mask, merge, score, query
In the following sections, we present under the situation of using KaMRaT in apptainer
.
For two alternative situations:
- to run KaMRaT within
singularity
container, please simply replace the keywordapptainer
bysingularity
; - to run KaMRaT after building from source, please replace the leading
apptainer exec -B /bind_src:/bind_des
by the path to KaMRaT binary file (in theapp/
folder).
KaMRaT's top-level helper is accessible by typing one of these commands:
apptainer exec kamrat
apptainer exec kamrat -h
apptainer exec kamrat -help
Helpers of each KaMRaT modules are accessible via one of these commands:
apptainer exec kamrat <CMD>
apptainer exec kamrat <CMD> -h
apptainer exec kamrat <CMD> -help
# <CMD> can be one of index, filter, mask, merge, score, query
[USAGE] kamrat index -intab STR -outdir STR [-klen INT -unstrand -nfbase INT]
[OPTION] -h, -help Print the helper
-intab STR Input table for index, mandatory
-outdir STR Output index directory, mandatory
-klen k-mer length, mandatory if features are k-mer
if present, indexation will be switched to k-mer mode
-unstrand Unstranded mode, indexation with canonical k-mers
if present, indexation will be switched to k-mer mode
-nfbase INT Base for calculating normalization factor, not compatible with -nffile STR
normCount_ij <- INT * rawCount_ij / sum_i{rawCount_ij}
if not provided, input counts will not be normalized
-nffile STR File for loading normalization factor, not compatible with -nfbase INT
a tab-separated row of normalization factors, same order as table header
[USAGE] kamrat filter -idxdir STR -design STR [-upmin INT1:INT2 -downmax INT1:INT2 -reverse -outpath STR -withcounts]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-design STR Path to filter design file, a table of two columns, mandatory
the first column indicate sample names
the second column should be either UP or DOWN (capital letters)
samples with UP will be considered as up-regulated samples
samples with DOWN will be considered as down-regulated samples
samples not given will be neutral (not considered for filter)
samples can also be all UP or all DOWN
-upmin INT1:INT2 Up feature lower bound, [1:1, meaning no filter]
output features counting >= INT1 in >= INT2 UP-samples
-downmax INT1:INT2 Down feature upper bound [inf:1, meaning no filter]
output features counting <= INT1 in >= INT2 DOWN-samples
-reverse Reverse filter, to remove eligible features [false]
-outpath STR Path to results after filter
if not provided, output to screen
-withcounts Output sample count vectors [false]
[USAGE] kamrat mask -idxdir STR -fasta STR [-reverse -outpath STR -withcounts]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-fasta STR Sequence fasta file as the mask, mandatory
-reverse Reverse mask, to select the k-mers in sequence fasta file [false]
-outpath STR Path to extension results
if not provided, output to screen
-withcounts Output sample count vectors [false]
[USAGE] kamrat merge -idxdir STR -overlap MAX-MIN [-with STR1[:STR2] -interv STR[:FLOAT] -min-nbkmer INT -outpath STR -withcounts STR]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-overlap MAX-MIN Overlap range for extension, by default: from (k-1) to ⌊k/2⌋
MIN and MAX are integers, MIN <= MAX < k-mer length
-with STR1[:STR2] File indicating k-mers to be extended (STR1) and rep-mode (STR2)
if not provided, all indexed k-mers are used for extension
in the file STR1, a supplementary column of rep-value can be provided
STR2 can be one of {min, minabs, max, maxabs} [min]
-interv STR[:FLOAT] Intervention method for extension [pearson:0.20]
can be one of {none, pearson, spearman, mac}
the threshold may follow a ':' symbol
-min-nbkmer INT Minimal length of extended contigs [0]
-outpath STR Path to extension results
if not provided, output to screen
-withcounts STR Output sample count vectors, STR can be one of [mean, median]
if not provided, output without count vector
Three intervention methods are available for choice:
-
pearson
: Pearson distance, i.e., 0.5 * [1 - pearson.correlation(x, y)] -
spearman
: Spearman distance, i.e., 0.5 * [1 - spearman.correlation(x, y)] -
mac
: mean absolute contrast, as described in [Nguyen, H. T., et al., 2021]
The threshold controlling these distances can be given between [0, 1], where 0 indicates the most strict case and 1 indicates the most permissive case (equivalent to none
).
KaMRaT score: score features by classification performance, statistical significance, correlation, or variability
[USAGE] kamrat score -idxdir STR -count-mode STR -scoreby STR -design STR [-with STR1[:STR2] -seltop NUM -outpath STR -withcounts]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-scoreby STR Scoring method, mandatory, can be one of:
classification (binary sample labels given by design file)
ttest.padj adjusted p-value of t-test between conditions
ttest.pi π-value of t-test between conditions
snr signal-to-noise ratio between conditions
lr:nfold accuracy by logistic regression classifier
classification (binary or multiple sample labels given by design file)
dids DIDS score
bayes:nfold accuracy by naive Bayes classifier
correlation evaluation (continuous sample labels given by design file)
pearson Pearson correlation with the continunous sample condition
spearman Spearman correlation with the continuous sample condition
unsupervised evaluation (no design file required)
sd standard deviation
rsd1 standard deviation adjusted by mean
rsd2 standard deviation adjusted by min
rsd3 standard deviation adjusted by median
entropy entropy of sample counts + 1
-design STR Path to file indicating sample-condition design, mandatory unless using sd, rsd1, rsd2, rsd3, entropy
without header line, each row can be either:
sample name, sample condition
sample name, sample condition, sample batch (only for lrc, nbc, and svm)
-with STR1[:STR2] File indicating features to score (STR1) and counting mode (STR2)
if not provided, all indexed features are used for scoring
STR2 can be one of [rep, mean, median]
-seltop NUM Select top scored features
if NUM > 1, number of top features to select (should be integer)
if 0 < NUM <= 1, ratio of top features to select
if absent or NUM <= 0, output all features
-outpath STR Path to scoring result
if not provided, output to screen
-withcounts Output sample count vectors [false]
[NOTE] For scoring methods lrc, nbc, and svm, a univariate CV fold number (nfold) can be provided
if nfold = 0, leave-one-out cross-validation
if nfold = 1, without cross-validation, training and testing on the whole datset
if nfold > 1, n-fold cross-validation
For t-test scoring methods, a transformation log2(x + 1) is applied to sample counts
For SVM scoring, sample counts standardization is applied feature by feature
For detailed description of some scoring methods, please refer to the supplementary document of our article for information.
KaMRaT score has an alias as KaMRaT rank, which share the same usage as described above. Please prioritize the "score" name instead of "rank". The alias is only to ensure compatiblility to previous projects, and may be deprecated in future release.
[USAGE] kamrat query -idxdir STR -fasta STR -toquery STR [-withabsent -outpath STR]
[OPTION] -h,-help Print the helper
-idxdir STR Indexing folder by KaMRaT index, mandatory
-fasta STR Sequence fasta file, mandatory
-toquery STR Query method, mandatory, can be one of:
mean mean count among all composite k-mers for each sample
median median count among all composite k-mers for each sample
-withabsent Output also absent queries (count vector all 0) [default: false]
-outpath STR Path to extension results
if not provided, output to screen