Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance2 into paper #56

Merged
merged 168 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
8f196ab
GitHub actions (#13)
lskatz Jun 17, 2021
cd9704e
Add files via upload
SVN-PhD Jul 1, 2021
0388be2
adding taxonomy_v3.5.1
SVN-PhD Jul 1, 2021
1a2044e
More formats (#17)
lskatz Jul 22, 2021
c1b607a
Listeria unit testing (#18)
lskatz Jul 23, 2021
78b8807
Merge branch 'master' of https://github.com/SVN-PhD/Kalamari
lskatz Jul 23, 2021
8ef7191
new parent id
lskatz Jul 23, 2021
b94f912
a get taxonomy script for a reduced set of dmp files
lskatz Jul 23, 2021
93b6c51
reduced taxonomy
lskatz Jul 23, 2021
9a8febe
testing v3.9.2
lskatz Jul 23, 2021
f487aef
added parentid to plasmids
lskatz Jul 28, 2021
d320139
Updating some Yersinia taxid (#16)
SVN-PhD Jul 28, 2021
37ed1f1
Merge branch 'master' of https://github.com/lskatz/Kalamari
lskatz Jul 28, 2021
993f7c6
adding v3.9.3 taxonomy
lskatz Jul 28, 2021
cc1cbd6
m
lskatz Jul 28, 2021
ecd900d
Merge branch 'custom-taxdump'
lskatz Jul 28, 2021
31dca6e
adding in Scott's Yersinia genomes
lskatz Jul 28, 2021
e9bef09
cleanup
lskatz Jul 28, 2021
d0d9b34
updated to correct src tax dir
lskatz Jul 28, 2021
c1de0bb
Update unit-testing.yml
lskatz Jul 28, 2021
33a41aa
Create CITATION.cff (#20)
lskatz Jul 29, 2021
fa462f7
Kraken1 unit test (#21)
lskatz Jul 29, 2021
6e25f2f
Database doc update (#22)
lskatz Jul 29, 2021
9917015
mash database
lskatz Jul 29, 2021
703eaa8
Define contributions (#23)
lskatz Jul 30, 2021
5e1404d
mmseqs2 just for fun
lskatz Jul 31, 2021
927137a
m
lskatz Jul 31, 2021
30b7b10
Sepia
lskatz Aug 17, 2021
e68611f
fixed bacillus genus back to bacteria in the plasmids (#24)
lskatz Aug 20, 2021
cfda1c1
Build sepia (#25)
lskatz Aug 20, 2021
32fe463
fixed a bug where the same fasta file would be downloaded twice and g…
lskatz Aug 20, 2021
715c538
Merge branch 'master' of https://github.com/lskatz/Kalamari
lskatz Aug 20, 2021
72ff3ad
Merge branch 'master' of https://github.com/lskatz/Kalamari
lskatz Feb 11, 2022
ffd0378
validate a kraken database better
lskatz Feb 22, 2022
2256705
MIDAS
lskatz Mar 8, 2022
edd79b2
m
lskatz Mar 8, 2022
c7dfe43
m
lskatz Mar 8, 2022
a192c7c
Update README.md with reqs and recs (#29)
kapsakcj May 4, 2022
bda35f6
Update chromosomes.tsv
lskatz May 18, 2022
6963541
using GITHUB_PATH to solve CI problems
lskatz May 18, 2022
2f46113
m
lskatz May 18, 2022
3d0b9d1
m
lskatz May 18, 2022
fa905d6
limit tests to target branches
lskatz May 18, 2022
7ae281d
jellyfish now in path
lskatz May 18, 2022
17f1b68
m
lskatz May 18, 2022
6814e6f
remove -x statement
lskatz May 18, 2022
45834dd
allow this workflow to work on master
lskatz May 18, 2022
42403da
trying out taxonomy validator workflow
lskatz May 18, 2022
1f47227
remove kraken1 from testing on this branch
lskatz May 18, 2022
6826d39
fix path to taxonomy
lskatz May 18, 2022
22ab700
Fix ci (#31)
lskatz May 18, 2022
9e182a4
Merge branch 'master' of github.com:lskatz/Kalamari
lskatz May 18, 2022
2abecab
check file sizes after pulling down accessions
lskatz May 18, 2022
7ac9e6a
more debugging in the ci just in case
lskatz May 18, 2022
36e9128
change cryptosporidium parent taxids to cryptosporidium the genus
lskatz Dec 14, 2022
f03b97c
marged new kalamari download script
lskatz Jan 13, 2023
5ecf07e
upped the version
lskatz Jan 13, 2023
5d56ed0
getExactTaxonomy.pl: better error messages
lskatz Jun 14, 2023
c24cff5
downloadKalamari.pl: add in retmax 1
lskatz Apr 10, 2024
53af4ee
only accept one sequence per insdc accession
lskatz Apr 11, 2024
197c711
script to download kalamari from source
lskatz Apr 29, 2024
5ec301f
numcpus option added; new bash script to download and format
lskatz Apr 29, 2024
16a92f3
bash downloadKalamari.sh
lskatz Apr 29, 2024
166f412
update to ubuntu 20
lskatz Apr 29, 2024
dbae8b1
2 cpus in test
lskatz Apr 29, 2024
c8814b5
add spreadsheet as a strategy variable
lskatz Apr 29, 2024
0eb21ec
m
lskatz Apr 29, 2024
6d2721e
m
lskatz Apr 29, 2024
4aff16c
split jobs between runners
lskatz Apr 29, 2024
8dd8c2d
fix math
lskatz Apr 29, 2024
88d5f65
adding more retries
lskatz Apr 29, 2024
47ea970
switch to 1 cpu for testing
lskatz Apr 29, 2024
ccd4ca2
bump tag to v5.3.0
lskatz Apr 29, 2024
5ca8b82
std output for downloadKalamari.sh
lskatz Apr 30, 2024
54f70b9
removed bioperl
lskatz Apr 30, 2024
29da55c
bump version; add more standard conda db location
lskatz Apr 30, 2024
78fef19
trying to speed up downloads
lskatz May 2, 2024
4f31101
vast speed increase with batch downloads; cleaned up chromosomes.tsv
lskatz May 3, 2024
088c971
moved version information to the script from Makefile.PL; removed --a…
lskatz May 3, 2024
ce73614
m
lskatz May 3, 2024
7d95383
remove edirect setup unit test
lskatz May 3, 2024
ed46b3a
update unit tests
lskatz May 3, 2024
585ed0f
just two chunks of tests
lskatz May 3, 2024
dd2abc2
batch more
lskatz May 3, 2024
9bd58e9
fix file sizes check
lskatz May 3, 2024
6698e50
just make the damn thing work
lskatz May 3, 2024
51fea40
bash file uses local repo files instead of curl; default buffer size 100
lskatz May 5, 2024
8bdf873
More proper build (#42)
lskatz May 9, 2024
43182f3
fix a downloading bug where sed stalls
lskatz May 10, 2024
d71690e
update for compressed kalamari library and more efficient kraken builds
lskatz May 14, 2024
d61b28b
update download script
lskatz May 15, 2024
e926b3d
Validate taxonomy (#43)
lskatz May 17, 2024
1f91677
Add genomes (#45) (#46)
lskatz May 17, 2024
f77f31d
init paper
lskatz May 18, 2024
c80df02
some revisions; taxonomy; downloading
lskatz May 18, 2024
4ef37ef
swap example
lskatz May 18, 2024
10e2aab
references
lskatz May 18, 2024
fb8ffb0
stole Joe's draft-pdf.yml
lskatz May 18, 2024
f45b4ed
update to version 4 of artifacts
lskatz May 18, 2024
c253755
plasmids description
lskatz May 18, 2024
2e53322
ignore rendered manuscripts
lskatz May 20, 2024
62e29d8
some minor fixes; author affiliations; code examples
lskatz May 20, 2024
5b6def8
added Shatavia; updated example
lskatz May 20, 2024
7a3fcab
m
lskatz May 20, 2024
d06aad2
revisions from Jess
lskatz May 20, 2024
ac3f1e4
refs
lskatz May 20, 2024
5da3f14
fix list that became italics
lskatz May 20, 2024
8cd8814
updated Andrew's affiliation
lskatz May 21, 2024
5fcc0a2
plasmid defined species
lskatz May 21, 2024
591e157
gave a name to the JOSS rendering
lskatz May 21, 2024
d0af6b7
try experimental docx file creation
lskatz May 21, 2024
7b23ebe
try 2 with container
lskatz May 21, 2024
ba0fa79
correct artifact Action
lskatz May 21, 2024
4362700
m
lskatz May 21, 2024
e7c9910
upload artifact v4
lskatz May 21, 2024
6a24c1c
branch agnostic
lskatz May 21, 2024
10c466f
try multiple formats; multiple uploads
lskatz May 22, 2024
6e62d80
fix some citations
lskatz May 22, 2024
3253b8e
fixed Dr. Lauer's info
lskatz May 22, 2024
05c5a74
remove format arg
lskatz May 22, 2024
28d1203
shatavia's orcid
lskatz May 22, 2024
64422a8
added Rebecca's and Jess's orcids
lskatz May 22, 2024
946d383
updated DOIs
lskatz May 22, 2024
29f9d28
fixed comment line
lskatz May 22, 2024
25cee50
added Entrez Edirect URL
lskatz May 22, 2024
c0efade
more Entrez citation with help from CoPilot
lskatz May 22, 2024
208b7d5
Andrew's orcid
lskatz May 22, 2024
8b308e4
misc
lskatz May 26, 2024
d91d388
remove random single quotes
lskatz May 26, 2024
c6a4acc
bump version
lskatz May 27, 2024
b46e920
helpful log messages
lskatz May 31, 2024
6019157
v5.6.3
lskatz May 31, 2024
f64d66c
updated revisions from coauthors
lskatz May 31, 2024
c072b70
entered Taylor's revisiosn
lskatz Jun 4, 2024
d7d1dc2
move Katie to acknowledgements due to her request
lskatz Jun 4, 2024
72f6dba
update genome list; stable efetching (#49)
lskatz Jun 5, 2024
e0455fe
changes from cdc clearance process
lskatz Jul 10, 2024
4e3036b
disable buggy docx creation
lskatz Jul 10, 2024
417c345
fix blast+ formatting typo
lskatz Jul 10, 2024
516668e
Change to MIT license
lskatz Jul 15, 2024
9acb5d9
Update README.md: remove CC license sticker
lskatz Jul 15, 2024
d36caf5
update entrez ref
lskatz Jul 15, 2024
00fc364
MRA
lskatz Aug 9, 2024
d874251
MRA
lskatz Aug 9, 2024
d47bf91
misc
lskatz Aug 30, 2024
adb97c6
500 words or less
lskatz Sep 3, 2024
f33caf5
nix example
lskatz Sep 3, 2024
d782d4f
abstract
lskatz Sep 3, 2024
7f3196e
abbreviate genera
lskatz Sep 3, 2024
3a3ed86
another paper revision
lskatz Sep 10, 2024
c131c7d
added asm pandoc template
lskatz Sep 10, 2024
3919dd6
provenance
lskatz Oct 22, 2024
c0ab73c
Leptospira interrogans => CP020414
lskatz Oct 22, 2024
c415a74
some progress
lskatz Oct 23, 2024
95f52c9
downloadKalamari.sh: nuccleotideAcc bug fuxed
lskatz Nov 1, 2024
5d5841b
v5.7.2
lskatz Nov 1, 2024
06455f2
another round of provenance
lskatz Nov 4, 2024
ae92d34
cleared out the unknowns list
lskatz Nov 4, 2024
ac6b933
fixed chromosomes with sources
lskatz Nov 4, 2024
daec382
chromosomes
lskatz Nov 4, 2024
eadee5c
try to run CI
lskatz Nov 4, 2024
65ff822
fix wildcard
lskatz Nov 4, 2024
f885d53
Merge branch 'master' into provenance2
lskatz Nov 4, 2024
8276102
better named sources for each assembly
lskatz Nov 4, 2024
10fd301
Merge branch 'provenance2' of github.com:lskatz/Kalamari into provena…
lskatz Nov 4, 2024
cd7a193
polish this directory
lskatz Nov 5, 2024
b4cb7f0
assembly-complete.gz
lskatz Nov 5, 2024
0b1cf72
Merge branch 'paper' into provenance2
lskatz Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 4 additions & 11 deletions .github/workflows/unit-testing.Listeria.Kraken1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
on:
push:
branches: [master, dev, validate-taxonomy]
pull_request:
name: Listeria-with-Kraken1

env:
Expand Down Expand Up @@ -41,18 +42,10 @@ jobs:
tree $(realpath .)
- name: install-edirect
run: |
sudo apt-get install ncbi-entrez-direct
echo "installed edirect the apt way"
exit
cd $HOME
perl -MNet::FTP -e '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login; $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.tar.gz");'
gunzip -cv edirect.tar.gz | tar xf -
rm -v edirect.tar.gz
echo $GITHUB_WORKSPACE/edirect >> $GITHUB_PATH
sh -c "$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)"
echo $HOME/edirect >> $GITHUB_PATH
echo $GITHUB_WORKSPACE/Kalamari/bin >> $GITHUB_PATH
#export PATH=${PATH}:$HOME/edirect >& /dev/null || setenv PATH "${PATH}:$HOME/edirect"
yes Y | ./edirect/setup.sh
tree edirect
tree $HOME/edirect
- name: check-env
run: echo "$PATH"
- name: select for only Listeria
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/unit-testing.Yersinia.Kraken2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
on:
push:
branches: [master, dev, validate-taxonomy]
pull_request:
name: Genera-with-Kraken2

env:
Expand Down Expand Up @@ -34,6 +35,12 @@ jobs:
- name: env check
run: |
echo $PATH | tr ':' '\n' | sort
- name: install-edirect
run: |
sh -c "$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)"
echo $HOME/edirect >> $GITHUB_PATH
echo $GITHUB_WORKSPACE/Kalamari/bin >> $GITHUB_PATH
tree $HOME/edirect
- name: apt-get install
run: sudo apt-get install ca-certificates tree jellyfish ncbi-entrez-direct
- name: select for only for this genus
Expand Down
15 changes: 5 additions & 10 deletions .github/workflows/unit-testing.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
on:
push:
branches: [master, dev, validate-taxonomy]
pull_request:
name: Pull-down-all-accessions

jobs:
Expand All @@ -27,16 +28,10 @@ jobs:
run: sudo apt-get install ca-certificates tree
- name: install-edirect
run: |
sudo apt-get install ncbi-entrez-direct
echo "installed edirect the apt way"
exit
cd $HOME
perl -MNet::FTP -e '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login; $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.tar.gz");'
gunzip -cv edirect.tar.gz | tar xf -
rm -v edirect.tar.gz
export PATH=${PATH}:$HOME/edirect >& /dev/null || setenv PATH "${PATH}:$HOME/edirect"
yes Y | ./edirect/setup.sh
tree edirect
sh -c "$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)"
echo $HOME/edirect >> $GITHUB_PATH
echo $GITHUB_WORKSPACE/Kalamari/bin >> $GITHUB_PATH
tree $HOME/edirect
- name: check-env
run: echo "$PATH"
- name: download
Expand Down
14 changes: 11 additions & 3 deletions .github/workflows/validateTaxonomy.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
on:
push:
branches: [master, dev, validate-taxonomy]
branches: [master, dev, esearch-input]
pull_request:
name: Validate taxonomy

jobs:
Expand All @@ -27,11 +28,18 @@ jobs:
echo $PATH
echo ""
cat $GITHUB_PATH
- name: install taxonkit
run: |
wget https://github.com/shenwei356/taxonkit/releases/download/v0.16.0/taxonkit_linux_amd64.tar.gz
tar -xvf taxonkit_linux_amd64.tar.gz
rm -v taxonkit_linux_amd64.tar.gz
chmod +x taxonkit
echo $(realpath .) >> $GITHUB_PATH
- name: build taxonomy
run: |
echo $PATH
bash Kalamari/bin/buildTaxonomy.sh
bash Kalamari/bin/filterTaxonomy.sh
bash -x Kalamari/bin/buildTaxonomy.sh
bash -x Kalamari/bin/filterTaxonomy.sh
ls -lhR Kalamari/share/kalamari-*/taxonomy
- name: validate taxonomy
run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ edirect
share
paper/paper.html
paper/paper.doc
# pixi environments
.pixi
*.egg-info
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Lee Katz

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Kalamari
A database of completed assemblies for metagenomics-related tasks

[![Creative Commons License v4](https://licensebuttons.net/l/by-sa/4.0/88x31.png)](LICENSE.md)
A database of completed assemblies for metagenomics-related tasks

## Synopsis

Expand Down
8 changes: 8 additions & 0 deletions bin/buildKraken1.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,16 @@ cp -rv $TAXDIR $DB/taxonomy

# Make --add-to-library more efficient with
# concatenated fasta files
export nl=$'\n'
find $SRC -name '*.fasta.gz' | \
xargs -n 100 -P 1 bash -c '
for i in "$@"; do
gzip -cd $i
done > $tmpfile
echo -ne "ADDING to library:\n "
zgrep "^>" $tmpfile | sed "s/^>//" | tr "$nl" " "
echo
echo "^^ contents of $tmpfile ^^"
kraken-build --db $DB --add-to-library $tmpfile
'

Expand All @@ -35,3 +40,6 @@ kraken-build --db $DB --build --threads 1
# Reduce the size of the database
kraken-build --db $DB --clean

if [ ! -e "$sharedir/kalamari-kraken1" ]; then
ln -sv kalamari-kraken "$sharedir/kalamari-kraken1"
fi
26 changes: 12 additions & 14 deletions bin/downloadKalamari.pl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
use IO::Compress::Gzip;
use version 0.77;

our $VERSION = version->parse("5.6.0");
our $VERSION = version->parse("5.7.2");

use threads;

Expand Down Expand Up @@ -167,27 +167,25 @@ sub downloadEntries{
my $numEntries = scalar(@$entries);
my @acc = map{$$_{nuccoreAcc}} @$entries;
logmsg "Downloading ".scalar(@acc)." accessions";
my $queryArg = join("[accession] OR ", sort(@acc))."[accession]";
my $dir = tempdir("download.XXXXXX", DIR=>$$settings{tempdir});

# Make the input file for efetch
my $inputAcc = "$dir/input.acc";
open(my $fh, ">", $inputAcc) or die "ERROR: could not write to $inputAcc: $!";
print $fh join("\n", @acc)."\n";
close $fh;

# Accessions that had errors
my @err;

# Get the esearch xml in place for at least one downstream query
my $esearchXml = "$dir/esearch.xml";
my $esearchCmd = "esearch -db nuccore -query '$queryArg' > $esearchXml";
command($esearchCmd);
# Get started on the comprehensive assembly file
my $outfile = "$dir/all.fasta";
logmsg "Downloading all accessions to $outfile using input accessions in $inputAcc";
command("efetch -db nuccore -input $inputAcc -format fasta > $dir/all.fasta");
if($?){
die "ERROR running: $esearchCmd: $!";
die "ERROR: could not download all accessions";
}

# Get started on the assembly file
my $outfile = "$dir/all.fasta";

# Main query: efetch
my $efetchCmd = "cat $esearchXml | efetch -format fasta > $outfile";
system($efetchCmd);

my $seqsWithVersion = readSeqs($outfile);
my $seqs = {};
while(my($acc, $seq) = each(%$seqsWithVersion)){
Expand Down
4 changes: 2 additions & 2 deletions bin/downloadKalamari.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ echo "TEMPDIR is $tempdir" >&2
echo "OUTDIR is $outdir_prefix" >&2

TSV="$tempdir/in.tsv"
cat $thisdir/../src/chromosomes.tsv > $TSV
cat $thisdir/../src/plasmids.tsv >> $TSV
cat $thisdir/../src/chromosomes.tsv > $TSV
tail -n +2 $thisdir/../src/plasmids.tsv >> $TSV

cp -rv $thisdir/../src/taxonomy $tempdir/taxonomy

Expand Down
5 changes: 5 additions & 0 deletions bin/filterTaxonomy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

set -eu

# Check for dependencies
echo "Check for dependencies"
which taxonkit
echo

thisdir=$(dirname $0)
thisfile=$(basename $0)
KALAMARI_VER=$(downloadKalamari.pl --version)
Expand Down
6 changes: 4 additions & 2 deletions paper/mra.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@ Kalamari also contains a custom taxonomy and software for downloading and format

## Announcement

Public Health laboratories sequence microbial pathogens daily for genomic epidemiology, i.e., to track pathogen spread [@armstrong2019pathogen].
Public Health laboratories sequence microbial pathogens daily for many applications including genomic epidemiology [@armstrong2019pathogen],
species identification [@lindsey2023rapid],
and metagenomic analysis [@huang2017metagenomics].
Relevant databases exist such as RefSeq [@o2016reference] or The Genome Taxonomy Database (GTDB) [@parks2022gtdb].
However, due to their so comprehensive nature,
they are disadvantageous for our specific purposes:
Expand All @@ -64,7 +66,7 @@ All chromosomes and plasmids are complete, i.e., no contig breaks,
and obtained from trusted sources, e.g., FDA-ARGOS [@sichtig2019fda] or the NCTC 3000 collection [@dicks2023nctc3000], or provided and reviewed by a CDC subject matter expert.

We obtained the list of plasmids from the Mob-Suite project [@robertsonMobsuite]
and clustered them at 97% average nucleotide identity (ANI) [@lindsey2023rapid].
and clustered them at 97% average nucleotide identity using edlb_ani_mummer v1 with default options [@lindsey2023rapid].
For each cluster, the taxonomy identifier was raised to the lowest common tier of taxonomy.
For example, if a cluster of plasmids were identified in both _Escherichia coli_ and _Salmonella enterica_, then taxonomy identifiers for all the plasmids in the cluster were changed to their common family, _Enterobacteriaceae_.
As a result, any taxonomic signature from these plasmids
Expand Down
Loading
Loading