Skip to content

Commit

Permalink
#11
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Jul 9, 2018
1 parent d4fac1c commit 9b540b5
Show file tree
Hide file tree
Showing 9 changed files with 94 additions and 62 deletions.
2 changes: 2 additions & 0 deletions dev-version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- `taxonkit name2taxid`: support synonyms names. [#9](https://github.com/shenwei356/taxonkit/commit/d4fac1c1138a571957f52eb431ff0d85c03852a8)
- add global flag: `--line-buffered` to disable output buffer. []#11](https://github.com/shenwei356/taxonkit/issues/11)
2 changes: 1 addition & 1 deletion doc/docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Taking virus for example.

**Another way is directly retrieving from [nr FASTA sequences](ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) using [SeqKit](http://bioinf.shenwei.me/seqkit/download):**

seqkit grep -f virus.taxid.acc.txt nr.gz | gzip -c > nr.virus.fa.gz
seqkit grep -f virus.taxid.acc.txt nr.gz -o nr.virus.fa.gz

<div id="disqus_thread"></div>
<script>
Expand Down
94 changes: 47 additions & 47 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,53 +357,53 @@ Examples:
(**recommended**, very useful for formating input data for
[LEfSe](https://bitbucket.org/biobakery/biobakery/wiki/lefse))

$ cat lineage.txt | taxonkit reformat -t -F > lineage.txt.reformat.fill
$ cat lineage.txt.reformat.fill \
| perl -pe 's/^/Taxid : /; \
s/\t/\nLineage : /; \
s/\t/\nReformat: /; \
s/\t/\nTaxids : /; \
print "\n";'

Taxid : 9606
Lineage : cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
Reformat: Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens
Taxids : 2759;7711;40674;9443;9604;9605;9606

Taxid : 9913
Lineage : cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Cetartiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bos taurus
Reformat: Eukaryota;Chordata;Mammalia;unclassified Mammalia order;Bovidae;Bos;Bos taurus
Taxids : 2759;7711;40674;;9895;9903;9913

Taxid : 376619
Lineage : cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisella tularensis;Francisella tularensis subsp. holarctica;Francisella tularensis subsp. holarctica LVS
Reformat: Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisella tularensis
Taxids : 2;1224;1236;72273;34064;262;263

Taxid : 349741
Lineage : cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
Reformat: Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Taxids : 2;74201;203494;48461;1647988;239934;239935

Taxid : 239935
Lineage : cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Reformat: Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Taxids : 2;74201;203494;48461;1647988;239934;239935

Taxid : 314101
Lineage : cellular organisms;Bacteria;environmental samples;uncultured murine large bowel bacterium BAC 54B
Reformat: Bacteria;unclassified Bacteria phylum;unclassified Bacteria class;unclassified Bacteria order;unclassified Bacteria family;unclassified Bacteria genus;uncultured murine large bowel bacterium BAC 54B
Taxids : 2;;;;;;314101

Taxid : 11932
Lineage : Viruses;Retro-transcribing viruses;Retroviridae;unclassified Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle
Reformat: Viruses;unclassified Viruses phylum;unclassified Viruses class;unclassified Viruses order;Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle
Taxids : 10239;;;;11632;11749;11932

Taxid : 1327037
Lineage : Viruses;dsDNA viruses, no RNA stage;Caudovirales;Siphoviridae;unclassified Siphoviridae;Croceibacter phage P2559Y
Reformat: Viruses;unclassified Viruses phylum;unclassified Viruses class;Caudovirales;Siphoviridae;unclassified Siphoviridae genus;Croceibacter phage P2559Y
Taxids : 10239;;;28883;10699;;1327037
$ cat lineage.txt | taxonkit reformat -t -F > lineage.txt.reformat.fill
$ cat lineage.txt.reformat.fill \
| perl -pe 's/^/Taxid : /; \
s/\t/\nLineage : /; \
s/\t/\nReformat: /; \
s/\t/\nTaxids : /; \
print "\n";'

Taxid : 9606
Lineage : cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
Reformat: Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens
Taxids : 2759;7711;40674;9443;9604;9605;9606

Taxid : 9913
Lineage : cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Cetartiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bos taurus
Reformat: Eukaryota;Chordata;Mammalia;unclassified Mammalia order;Bovidae;Bos;Bos taurus
Taxids : 2759;7711;40674;;9895;9903;9913

Taxid : 376619
Lineage : cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisella tularensis;Francisella tularensis subsp. holarctica;Francisella tularensis subsp. holarctica LVS
Reformat: Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Francisellaceae;Francisella;Francisella tularensis
Taxids : 2;1224;1236;72273;34064;262;263

Taxid : 349741
Lineage : cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
Reformat: Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Taxids : 2;74201;203494;48461;1647988;239934;239935

Taxid : 239935
Lineage : cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Reformat: Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Taxids : 2;74201;203494;48461;1647988;239934;239935

Taxid : 314101
Lineage : cellular organisms;Bacteria;environmental samples;uncultured murine large bowel bacterium BAC 54B
Reformat: Bacteria;unclassified Bacteria phylum;unclassified Bacteria class;unclassified Bacteria order;unclassified Bacteria family;unclassified Bacteria genus;uncultured murine large bowel bacterium BAC 54B
Taxids : 2;;;;;;314101

Taxid : 11932
Lineage : Viruses;Retro-transcribing viruses;Retroviridae;unclassified Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle
Reformat: Viruses;unclassified Viruses phylum;unclassified Viruses class;unclassified Viruses order;Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle
Taxids : 10239;;;;11632;11749;11932

Taxid : 1327037
Lineage : Viruses;dsDNA viruses, no RNA stage;Caudovirales;Siphoviridae;unclassified Siphoviridae;Croceibacter phage P2559Y
Reformat: Viruses;unclassified Viruses phylum;unclassified Viruses class;Caudovirales;Siphoviridae;unclassified Siphoviridae genus;Croceibacter phage P2559Y
Taxids : 10239;;;28883;10699;;1327037

1. Support tab in format string

Expand Down
24 changes: 13 additions & 11 deletions taxonkit/cmd/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,26 @@ import (
)

// VERSION of csvtk
const VERSION = "0.2.4"
const VERSION = "0.2.5-dev"

// Config is the struct containing all global flags
type Config struct {
Threads int
OutFile string
NodesFile string
NamesFile string
Verbose bool
Threads int
OutFile string
NodesFile string
NamesFile string
Verbose bool
LineBuffered bool
}

func getConfigs(cmd *cobra.Command) Config {
return Config{
Threads: getFlagPositiveInt(cmd, "threads"),
OutFile: getFlagString(cmd, "out-file"),
NodesFile: getFlagString(cmd, "nodes-file"),
NamesFile: getFlagString(cmd, "names-file"),
Verbose: getFlagBool(cmd, "verbose"),
Threads: getFlagPositiveInt(cmd, "threads"),
OutFile: getFlagString(cmd, "out-file"),
NodesFile: getFlagString(cmd, "nodes-file"),
NamesFile: getFlagString(cmd, "names-file"),
Verbose: getFlagBool(cmd, "verbose"),
LineBuffered: getFlagBool(cmd, "line-buffered"),
}
}

Expand Down
3 changes: 3 additions & 0 deletions taxonkit/cmd/lineage.go
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,9 @@ var lineageCmd = &cobra.Command{
} else {
outfh.WriteString(fmt.Sprintf("%s\t%s\n", t2l.line, t2l.lineage))
}
if config.LineBuffered {
outfh.Flush()
}
}
}
}
Expand Down
21 changes: 18 additions & 3 deletions taxonkit/cmd/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,12 @@ var listCmd = &cobra.Command{
level = 1
}
outfh.WriteString("\n")
if config.LineBuffered {
outfh.Flush()
}

traverseTree(tree, int32(id), outfh, indent, level+1, names,
printName, ranks, printRank, jsonFormat)
printName, ranks, printRank, jsonFormat, config)

if jsonFormat {
outfh.WriteString(fmt.Sprintf("%s}", strings.Repeat(indent, level)))
Expand All @@ -152,10 +155,16 @@ var listCmd = &cobra.Command{
outfh.WriteString(",")
}
outfh.WriteString("\n")
if config.LineBuffered {
outfh.Flush()
}
}

if jsonFormat {
outfh.WriteString("}\n")
if config.LineBuffered {
outfh.Flush()
}
}

defer outfh.Close()
Expand All @@ -176,7 +185,7 @@ func traverseTree(tree map[int32]map[int32]bool, parent int32,
outfh *xopen.Writer, indent string, level int,
names map[int32]string, printName bool,
ranks map[int32]string, printRank bool,
jsonFormat bool) {
jsonFormat bool, config Config) {
if _, ok := tree[parent]; !ok {
return
}
Expand Down Expand Up @@ -223,18 +232,24 @@ func traverseTree(tree map[int32]map[int32]bool, parent int32,
}
}
outfh.WriteString("\n")
if config.LineBuffered {
outfh.Flush()
}

tree[parent][child] = true

traverseTree(tree, child, outfh, indent, level+1, names, printName,
ranks, printRank, jsonFormat)
ranks, printRank, jsonFormat, config)

if jsonFormat && ok {
outfh.WriteString(fmt.Sprintf("%s}", strings.Repeat(indent, level)))
if level > 2 && i < len(children)-1 {
outfh.WriteString(",")
}
outfh.WriteString("\n")
if config.LineBuffered {
outfh.Flush()
}
}
}
}
6 changes: 6 additions & 0 deletions taxonkit/cmd/name2taxid.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,13 +124,19 @@ var name2taxidCmd = &cobra.Command{
} else {
outfh.WriteString(fmt.Sprintf("%s\t%s\n", l2t.line, ""))
}
if config.LineBuffered {
outfh.Flush()
}
} else {
for _, taxid = range l2t.taxids {
if printRank {
outfh.WriteString(fmt.Sprintf("%s\t%d\t%s\n", l2t.line, taxid, ranks[taxid]))
} else {
outfh.WriteString(fmt.Sprintf("%s\t%d\n", l2t.line, taxid))
}
if config.LineBuffered {
outfh.Flush()
}
}
}
}
Expand Down
3 changes: 3 additions & 0 deletions taxonkit/cmd/reformat.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,9 @@ column by flag "-t/--show-lineage-taxids".
} else {
outfh.WriteString(fmt.Sprintf("%s\t%s\n", l2s.line, l2s.flineage))
}
if config.LineBuffered {
outfh.Flush()
}
}
}
}
Expand Down
1 change: 1 addition & 0 deletions taxonkit/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Dataset:
RootCmd.PersistentFlags().StringP("nodes-file", "", NodesFile, "nodes.dmp file")
RootCmd.PersistentFlags().StringP("names-file", "", NamesFile, "names.dmp file")
RootCmd.PersistentFlags().BoolP("verbose", "", false, "print verbose information")
RootCmd.PersistentFlags().BoolP("line-buffered", "", false, "use line buffering on output, i.e., immediately writing to stdin/file for every line of output")

var existed bool
existed, err = pathutil.DirExists(DataDir)
Expand Down

0 comments on commit 9b540b5

Please sign in to comment.