Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it expected to have different output from reformat depending on whether the input are only tax IDs or a lineage table? #62

Closed
2 tasks done
Midnighter opened this issue Aug 9, 2022 · 2 comments

Comments

@Midnighter
Copy link

Prerequisites

  • taxonkit v0.12.0
  • read the usage

Describe your issue

I want to create a simplified taxonomy of all fungi. It seems that with the latest version of taxonkit, I have two options:

  1. Create a lineage file from a bunch of taxonomy IDs and then reformat it.

    taxonkit list --ids 4751 --indent "" | \
        taxonkit lineage | \
        taxonkit reformat \
        --taxid-field 1 \
        --fill-miss-rank \
        --output-ambiguous-result \
        --add-prefix \
        --show-lineage-taxids \
        --format "{k};{p};{c};{o};{f};{g};{s}"

    First line of the output:

    4751	cellular organisms;Eukaryota;Opisthokonta;Fungi	k__Eukaryota;p__unclassified Eukaryota phylum;c__unclassified Eukaryota class;o__unclassified Eukaryota order;f__unclassified Eukaryota family;g__unclassified Eukaryota genus;s__unclassified Eukaryota species	2759;;;;;;
  2. Alternatively, I can directly use the identifiers as input to reformat.

    taxonkit list --ids 4751 --indent "" | \
        taxonkit reformat \
        --taxid-field 1 \
        --fill-miss-rank \
        --output-ambiguous-result \
        --add-prefix \
        --show-lineage-taxids \
        --format "{k};{p};{c};{o};{f};{g};{s}"

    First line of the output:

    4751	k__Eukaryota;p__unclassified Eukaryota phylum;c__unclassified Eukaryota class;o__unclassified Eukaryota order;f__unclassified Eukaryota family;g__unclassified Eukaryota genus;s__unclassified Eukaryota species	2759;;;;;;
@shenwei356
Copy link
Owner

Yes, you found it. In the beginning, reformat only accept full lineage as input. In detail, it used one node and its parent node to get the taxId, however, this brought some errors for some taxa. So, after v0.8.0, it accepts input of TaxIds via flag -I/--taxid-field.

@Midnighter
Copy link
Author

Sorry, my question here is from the title, is it expected that the information output differs between the two? Probably yes, as additional columns from the lineage file are preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants