Skip to content

Commit

Permalink
v0.9.0
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Sep 28, 2022
1 parent ca3fbb7 commit fe3d924
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 28 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

### v0.9.0 - 2022-09-00
### v0.9.0 - 2022-09-28

- `compute`:
- smaller output files and faster speed.
Expand Down
6 changes: 3 additions & 3 deletions docs/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ All prebuilt databases and the used reference genomes are available at:

- [OneDrive](https://1drv.ms/u/s!Ag89cZ8NYcqtjHwpe0ND3SUEhyrp?e=QDRbEC) for *global users*.
- [CowTransfer](https://shenwei356.cowtransfer.com/s/c7220dd5901c42) for *Chinese users and global users*.<br>
**Please click the "kmcp+105 more files" link to browse directories and files, and choose an indiviual file to download**.<br>
**Please click the "kmcp" link to browse directories and files, and choose an indiviual file to download**.<br>

<p style="color:Tomato;">Please check file integrity with `md5sum` after download the files:</p>

md5sum -c gtdb.kmcp.tar.gz.md5.txt
md5sum -c gtdb.kmcp.tar.gz.md5.txt genbank-viral.kmcp.tar.gz.md5.txt refseq-fungi.kmcp.tar.gz.md5.txt

**Hardware requirements**

Expand Down Expand Up @@ -43,7 +43,7 @@ Users can also [build custom databases](#building-custom-databases), it's simple
|**Bacteria and Archaea**|GTDB r202 |28073+ |47894 |k=21, chunks=10;<br>fpr=0.3, hashes=1 |[gtdb.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtkBFGpARKkdzpfAxf?e=IPQN22) (50.34 GB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtkA8IUG1zuh2wuYCh?e=jUkUXQ)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/3426e055bee74a) ([md5](https://shenwei356.cowtransfer.com/s/a8e60e9040eb4c)) |58.03 GB |
|**Bacteria and Archaea**|HumGut |1594+ |30691 |k=21, chunks=10;<br>fpr=0.3, hashes=1 |[humgut.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtjUxZymOTLu1qJyDI?e=ZPWhDt) (18.77 GB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtjUVZu1Y-Vtussvdc?e=wHlWdm)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/0b88a8ef2cff42) ([md5](https://shenwei356.cowtransfer.com/s/a04127a6bfb648)) |21.52 GB |
|**Fungi** |Refseq r208|398 |403 |k=21, chunks=10;<br>fpr=0.3, hashes=1 |[refseq-fungi.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtkBCf0vPMatJbSvtF?e=2jE0HH) (3.68 GB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtkA0ZuDblb_hNJAtP?e=brrpFn)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/62e1abfa795443) ([md5](https://shenwei356.cowtransfer.com/s/09a50702304343)) |4.18 GB |
|**Viruses** |GenBank 246|23632 |27936 |k=21, chunks=5;<br>fpr=0.05, hashes=1 |[genbank-viral.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtkA7ofenEH6ve7va7?e=rgb5Vz) (1.25 GB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtkAx0HPhHUSthZMxO?e=sUwaKM)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/351451ef4e6d41) ([md5](https://shenwei356.cowtransfer.com/s/e359c61253fb44))|4.72 GB |
|**Viruses** |GenBank 246|23632 |27936 |k=21, chunks=10;<br>fpr=0.05, hashes=1 |[genbank-viral.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtkA7ofenEH6ve7va7?e=rgb5Vz) (1.25 GB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtkAx0HPhHUSthZMxO?e=sUwaKM)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/351451ef4e6d41) ([md5](https://shenwei356.cowtransfer.com/s/e359c61253fb44))|4.72 GB |
|**Human** |CHM13 |1 |1 |k=21, chunks=1024;<br>fpr=0.3, hashes=1|[human-chm13.kmcp.tar.gz](https://1drv.ms/u/s!Ag89cZ8NYcqtjVQgKPCZ7jciZqEp?e=jAO76U) (818 MB, [md5](https://1drv.ms/t/s!Ag89cZ8NYcqtjU1nGeOJaFf70y_K?e=bzJPcE)),<br>[CowTransfer link](https://shenwei356.cowtransfer.com/s/07e614a36b1a4b) ([md5](https://shenwei356.cowtransfer.com/s/c91d4c98677645)) |946 MB |

*based on NCBI taxonomy data 2021-12-06. `+` is used because some species are unclassfied xxx.
Expand Down
49 changes: 36 additions & 13 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,38 @@ in two packages for better searching performance.

## Current Version

### v0.8.3 - 2022-08-15 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.8.3/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.8.3)
### v0.9.0 - 2022-09-28 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.0/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.0)

- `kmcp`: fix compiling from source for ARM architectures.[#17](https://github.com/shenwei356/kmcp/issues/17)
- `compute`:
- smaller output files and faster speed.
- more even genome splitting.
- `index`:
- faster speed due to smaller input files.
- `search`:
- fix searching with paired-end reads where the read2 is shorter than the value of `--min-query-len`. [#10](https://github.com/shenwei356/kmcp/issues/10)
- fix the log. [#8](https://github.com/shenwei356/kmcp/issues/8)
- a new flag `-f/--max-fpr`: maximum false positive rate of a query (default 0.05). It reduces the unnecessary output when searching with a low minimum query coverage (`-t/--min-query-cov`).
- ***more accurate and smaller query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound***.
- change the default value of `-f/--max-fpr` from 0.05 to 0.01.
- ***10-20% speedup***.
- `profile`:
- recommend using the flag `--no-amb-corr` to disable ambiguous reads correction when >= 1000 candidates are detected.
- fix logging when using `--level strain` and no taxonomy given.

- ***more accurate abundance estimation using EM algorithm***.
- change the default value of `-f/--max-fpr` from 0.05 to 0.01.
- mode 0: change the default value of `-H/--min-hic-ureads-qcov` from 0.55 to 0.7.
- increase float width of reference coverage in KMCP profile format from 2 to 6.
- `util query-fpr`:
- compute query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound.
- new commands:
- `utils split-genomes` for splitting genomes into chunks.
- `utils ref-info` for printing information of reference (chunks), including the number of k-mers
and the actual false-positive rate.

### Links

OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_linux_amd64.tar.gz)
Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_linux_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_linux_arm64.tar.gz)
macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_darwin_amd64.tar.gz)
macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_darwin_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_darwin_arm64.tar.gz)
Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.8.3/kmcp_windows_amd64.exe.tar.gz)
Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_linux_amd64.tar.gz)
Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_linux_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_linux_arm64.tar.gz)
macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_darwin_amd64.tar.gz)
macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_darwin_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_darwin_arm64.tar.gz)
Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.0/kmcp_windows_amd64.exe.tar.gz)

*Notes:*

Expand Down Expand Up @@ -136,6 +147,18 @@ fish:

## Release History


### v0.8.3 - 2022-08-15 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.8.3/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.8.3)

- `kmcp`: fix compiling from source for ARM architectures.[#17](https://github.com/shenwei356/kmcp/issues/17)
- `search`:
- fix searching with paired-end reads where the read2 is shorter than the value of `--min-query-len`. [#10](https://github.com/shenwei356/kmcp/issues/10)
- fix the log. [#8](https://github.com/shenwei356/kmcp/issues/8)
- a new flag `-f/--max-fpr`: maximum false positive rate of a query (default 0.05). It reduces the unnecessary output when searching with a low minimum query coverage (`-t/--min-query-cov`).
- `profile`:
- recommend using the flag `--no-amb-corr` to disable ambiguous reads correction when >= 1000 candidates are detected.
- fix logging when using `--level strain` and no taxonomy given.

### [v0.8.2](https://github.com/shenwei356/kmcp/releases/tag/v0.8.2) - 2022-03-26 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.8.2/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.8.2)

- `search`:
Expand Down
5 changes: 3 additions & 2 deletions docs/tutorial/profiling/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,15 +129,16 @@ can use stricter criteria in `kmcp profile`.
--min-kmers 10 \
--min-query-len 30 \
--min-query-cov 0.55 \
$read1 $read2 \
$read1 \
$read2 \
--out-file $sample.kmcp@$dbname.tsv.gz \
--log $sample.kmcp@$dbname.tsv.gz.log
done

# 2. Merging search results against multiple databases
kmcp merge $sample.kmcp@*.tsv.gz --out-file $sample.kmcp.tsv.gz

Pair-end reads:
Paired-end reads:

# ---------------------------------------------------
# paired-end
Expand Down
18 changes: 9 additions & 9 deletions docs/tutorial/searching/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ The searching process is simple and [very fast](https://bioinf.shenwei.me/kmcp/b

kmcp search --query-whole-file -d gtdb.minhash.kmcp/ \
--query-whole-file --sort-by jacc --min-query-cov 0.2 \
--query-id genomme1 contigs.fasta -o result.tsv
--query-id genome1 contigs.fasta -o result.tsv

The output is in tab-delimited format:

Expand All @@ -232,34 +232,34 @@ A full search result:

|#query |qLen |qKmers|FPR |hits|target |chunkIdx|chunks|tLen |kSize|mKmers|qCov |tCov |jacc |queryIdx|
|:-------|:------|:-----|:----------|:---|:--------------|:-------|:-----|:------|:----|:-----|:-----|:-----|:-----|:-------|
|genomme1|9488952|18737 |0.0000e+00 |2 |GCF_000742135.1|0 |1 |5545784|31 |8037 |0.4289|0.7365|0.3719|0 |
|genomme1|9488952|18737 |3.1964e-183|2 |GCF_000392875.1|0 |1 |2881400|31 |3985 |0.2127|0.7062|0.1954|0 |
|genome1 |9488952|18737 |0.0000e+00 |2 |GCF_000742135.1|0 |1 |5545784|31 |8037 |0.4289|0.7365|0.3719|0 |
|genome1 |9488952|18737 |3.1964e-183|2 |GCF_000392875.1|0 |1 |2881400|31 |3985 |0.2127|0.7062|0.1954|0 |

Reference IDs can be optionally mapped to their names, let's print the main columns only:

kmcp search --query-whole-file -d gtdb.minhash.kmcp/ \
--name-map name.map \
--query-whole-file --sort-by jacc --min-query-cov 0.2 \
--query-id genomme1 contigs.fasta \
--query-id genome1 contigs.fasta \
| csvtk rename -t -C $ -f 1 -n query \
| csvtk cut -t -f query,jacc,target \
> result.tsv

|query |jacc |target |
|:-------|:-----|:-----------------------------------------------------------------------------------------------|
|genomme1|0.3719|NZ_KN046818.1 Klebsiella pneumoniae strain ATCC 13883 scaffold1, whole genome shotgun sequence |
|genomme1|0.1954|NZ_KB944588.1 Enterococcus faecalis ATCC 19433 acAqW-supercont1.1, whole genome shotgun sequence|
|genome1 |0.3719|NZ_KN046818.1 Klebsiella pneumoniae strain ATCC 13883 scaffold1, whole genome shotgun sequence |
|genome1 |0.1954|NZ_KB944588.1 Enterococcus faecalis ATCC 19433 acAqW-supercont1.1, whole genome shotgun sequence|

Using closed syncmer:

kmcp search --query-whole-file -d gtdb.syncmer.kmcp/ \
--name-map name.map \
--query-whole-file --sort-by jacc --min-query-cov 0.2 \
--query-id genomme1 contigs.fasta \
--query-id genome1 contigs.fasta \
| csvtk rename -t -C $ -f 1 -n query \
| csvtk cut -t -f query,jacc,target

|query |jacc |target |
|:-------|:-----|:-----------------------------------------------------------------------------------------------|
|genomme1|0.3712|NZ_KN046818.1 Klebsiella pneumoniae strain ATCC 13883 scaffold1, whole genome shotgun sequence |
|genomme1|0.1974|NZ_KB944588.1 Enterococcus faecalis ATCC 19433 acAqW-supercont1.1, whole genome shotgun sequence|
|genome1 |0.3712|NZ_KN046818.1 Klebsiella pneumoniae strain ATCC 13883 scaffold1, whole genome shotgun sequence |
|genome1 |0.1974|NZ_KB944588.1 Enterococcus faecalis ATCC 19433 acAqW-supercont1.1, whole genome shotgun sequence|

0 comments on commit fe3d924

Please sign in to comment.