Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in yacht convert #114

Closed
OliverBryan opened this issue Feb 21, 2024 · 6 comments
Closed

Error in yacht convert #114

OliverBryan opened this issue Feb 21, 2024 · 6 comments

Comments

@OliverBryan
Copy link
Collaborator

Working with data from https://frl.publisso.de/data/frl:6425521/marine/short_read/marmgCAMI2_sample_0_reads.tar.gz and using a trained version of the gtdb database, I ran into the following error using yacht convert to convert the output of yacht into the cami format. I ran yacht convert --yacht_output 'result.xlsx' --sheet_name 'min_coverage1.0' --genome_to_taxid 'genome_to_taxid.tsv' --mode 'cami' --sample_name 'MySample' --outfile_prefix 'cami_result' --outdir ./ and got the following error:

(yacht_env) oliverbryan@DESKTOP-7KPRH50:~/YACHT/testing$ yacht convert --yacht_output 'result.xlsx' --sheet_name 'min_coverage1.0' --genome_to_taxid 'genome_to_taxid.tsv' --mode 'cami' --sample_name 'MySample'
 --outfile_prefix 'cami_result' --outdir ./
Traceback (most recent call last):
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/scope.py", line 231, in resolve
    return self.resolvers[key]
           ~~~~~~~~~~~~~~^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/collections/__init__.py", line 1014, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/collections/__init__.py", line 1006, in __missing__
    raise KeyError(key)
KeyError: 'RANK'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/scope.py", line 242, in resolve
    return self.temps[key]
           ~~~~~~~~~~^^^^^
KeyError: 'RANK'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/oliverbryan/miniconda3/envs/yacht_env/bin/yacht", line 33, in <module>
    sys.exit(load_entry_point('yacht', 'console_scripts', 'yacht')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/YACHT/yacht/__init__.py", line 89, in main
    args.func(args)
  File "/home/oliverbryan/YACHT/yacht/standardize_yacht_output.py", line 135, in main
    standardize_yacht_output.run(
  File "/home/oliverbryan/YACHT/yacht/standardize_yacht_output.py", line 483, in run
    result = self.__to_cami(sample_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/YACHT/yacht/standardize_yacht_output.py", line 307, in __to_cami
    res_df = [summary_df.query(f'RANK == "{rank}"') for rank in self.allowable_rank]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/frame.py", line 4811, in query
    res = self.eval(expr, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/frame.py", line 4937, in eval
    return _eval(expr, inplace=inplace, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/eval.py", line 336, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 809, in __init__
    self.terms = self.parse()
                 ^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 828, in parse
    return self._visitor.visit(self.expr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 418, in visit_Module
    return self.visit(expr, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 421, in visit_Expr
    return self.visit(node.value, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 719, in visit_Compare
    return self.visit(binop)
           ^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 532, in visit_BinOp
    op, op_class, left, right = self._maybe_transform_eq_ne(node)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 452, in _maybe_transform_eq_ne
    left = self.visit(node.left, side="left")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/expr.py", line 545, in visit_Name
    return self.term_type(node.id, self.env, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/ops.py", line 91, in __init__
    self._value = self._resolve_name()
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/ops.py", line 115, in _resolve_name
    res = self.env.resolve(local_name, is_local=is_local)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oliverbryan/miniconda3/envs/yacht_env/lib/python3.12/site-packages/pandas/core/computation/scope.py", line 244, in resolve
    raise UndefinedVariableError(key, is_local) from err
pandas.errors.UndefinedVariableError: name 'RANK' is not defined

I have attached my result.xlsx file genome_to_taxid.tsv file (changed to a .txt file since github does not support .tsv files but nothing else is changed about it) for reference.
result.xlsx
genome_to_taxid.txt

@chunyuma
Copy link
Member

Hi @OliverBryan, after checking the files you attached, I found the issue.

The organism_name column in the result.xlsx file doesn't match to the genome_id column in the genome_to_taxid.tsv file. Here is an example:

GCF_000364605.1_genomic is in the genome_to_taxid.tsv file but its corresponding genome name GCF_000364605.1 Nocardioides sp. Iso805N strain=Iso805N, ASM36460v1 in the result.xlsx. You should use either one of them.

@OliverBryan
Copy link
Collaborator Author

I have updated the genome_to_taxid.tsv file to fix this, perhaps I did it incorrectly as I am still getting the same error. I have attached my updated genome_to_taxid.tsv file. I also am now running yacht convert --yacht_output 'result.xlsx' --sheet_name 'raw_result' --genome_to_taxid 'genome_to_taxid.tsv' --mode 'cami' --sample_name 'MySample' --outfile_prefix 'cami_result' --outdir ./ using the raw_result sheet instead of the min_coverage1.0 sheet, but both commands give the same error.

@chunyuma
Copy link
Member

Hi @OliverBryan, thanks for trying out my suggestion. I can't find the attached files in your last message.

@OliverBryan
Copy link
Collaborator Author

My apologies @chunyuma I forgot to attach it, I have attached it to this message.
genome_to_taxid.txt

chunyuma added a commit that referenced this issue Feb 23, 2024
@chunyuma chunyuma mentioned this issue Feb 23, 2024
@chunyuma
Copy link
Member

Hi @OliverBryan, sorry for the late response.

I figured out the issue. When I wrote the script, for some reasons, I didn't expect the genome_id column in the genome_to_taxid.tsv file to have a space like GCF_000364605.1 Nocardioides sp. Iso805N strain=Iso805N, ASM36460v1. I removed the name of each genome in the genome_id column and then got GCF_000364605.1. It works now. I have attached the updated genome_to_taxid.txt file (see below). You can have a try.

genome_to_taxid_test.txt

But thanks for letting me be aware of it. This should be a bug. I have fixed it in the code in this PR.

@OliverBryan
Copy link
Collaborator Author

Hi @chunyuma,

Everything is working on my end with the updated genome_to_taxid_test.tsv file, thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants