Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: data type must provide an itemsize #719

Open
alimanfoo opened this issue Jan 31, 2025 · 0 comments
Open

ValueError: data type must provide an itemsize #719

alimanfoo opened this issue Jan 31, 2025 · 0 comments
Labels

Comments

@alimanfoo
Copy link
Member

alimanfoo commented Jan 31, 2025

This code run on colab:

sample_sets = ["AG1000G-MW"]
sample_query = "taxon = 'arabiensis'"
cyp6aap_region = "2R:28,480,000-28,510,000"
df_cyp6aap_cnv = ag3.gene_cnv_frequencies(
    region=cyp6aap_region,
    cohorts="admin2_year",
    sample_sets=sample_sets,
    sample_query=sample_query,
)
df_cyp6aap_cnv

...generates an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-12-6f50beb47483>](https://localhost:8080/#) in <cell line: 0>()
      2 sample_query = "taxon = 'arabiensis'"
      3 cyp6aap_region = "2R:28,480,000-28,510,000"
----> 4 df_cyp6aap_cnv = ag3.gene_cnv_frequencies(
      5     region=cyp6aap_region,
      6     cohorts="admin2_year",

15 frames
[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in gene_cnv_frequencies(***failed resolving arguments***)
    217         debug("access and concatenate data from regions")
    218         df = pd.concat(
--> 219             [
    220                 self._gene_cnv_frequencies(
    221                     region=r,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in <listcomp>(.0)
    218         df = pd.concat(
    219             [
--> 220                 self._gene_cnv_frequencies(
    221                     region=r,
    222                     cohorts=cohorts,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in _gene_cnv_frequencies(self, region, cohorts, sample_query, sample_query_options, min_cohort_size, sample_sets, drop_invariant, max_coverage_variance, include_counts, chunks, inline_array)
    263 
    264         debug("get gene copy number data")
--> 265         ds_cnv = self.gene_cnv(
    266             region=region,
    267             sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in gene_cnv(***failed resolving arguments***)
     61 
     62         ds = simple_xarray_concat(
---> 63             [
     64                 self._gene_cnv(
     65                     region=r,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in <listcomp>(.0)
     62         ds = simple_xarray_concat(
     63             [
---> 64                 self._gene_cnv(
     65                     region=r,
     66                     sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_frq.py](https://localhost:8080/#) in _gene_cnv(self, region, sample_sets, sample_query, sample_query_options, max_coverage_variance, chunks, inline_array)
    105 
    106         # Access HMM data.
--> 107         ds_hmm = self.cnv_hmm(
    108             region=cnv_region,
    109             sample_sets=sample_sets,

[/usr/local/lib/python3.11/dist-packages/malariagen_data/util.py](https://localhost:8080/#) in check_types_wrapper(*args, **kwargs)
   1162                     error = TypeError(message)
   1163                     raise error from None
-> 1164         return f(*args, **kwargs)
   1165 
   1166     return check_types_wrapper

[/usr/local/lib/python3.11/dist-packages/malariagen_data/anoph/cnv_data.py](https://localhost:8080/#) in cnv_hmm(***failed resolving arguments***)
    256                 debug("apply the query")
    257                 sample_query_options = sample_query_options or {}
--> 258                 loc_query_samples = df_samples_cnv.eval(
    259                     sample_query, **sample_query_options
    260                 ).values

[/usr/local/lib/python3.11/dist-packages/pandas/core/frame.py](https://localhost:8080/#) in eval(self, expr, inplace, **kwargs)
   4947         kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers
   4948 
-> 4949         return _eval(expr, inplace=inplace, **kwargs)
   4950 
   4951     def select_dtypes(self, include=None, exclude=None) -> Self:

[/usr/local/lib/python3.11/dist-packages/pandas/core/computation/eval.py](https://localhost:8080/#) in eval(expr, parser, engine, local_dict, global_dict, resolvers, level, target, inplace)
    355         eng = ENGINES[engine]
    356         eng_inst = eng(parsed_expr)
--> 357         ret = eng_inst.evaluate()
    358 
    359         if parsed_expr.assigner is None:

[/usr/local/lib/python3.11/dist-packages/pandas/core/computation/engines.py](https://localhost:8080/#) in evaluate(self)
     79 
     80         # make sure no names in resolvers and locals/globals clash
---> 81         res = self._evaluate()
     82         return reconstruct_object(
     83             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

[/usr/local/lib/python3.11/dist-packages/pandas/core/computation/engines.py](https://localhost:8080/#) in _evaluate(self)
    119         scope = env.full_scope
    120         _check_ne_builtin_clash(self.expr)
--> 121         return ne.evaluate(s, local_dict=scope)
    122 
    123 

[/usr/local/lib/python3.11/dist-packages/numexpr/necompiler.py](https://localhost:8080/#) in evaluate(ex, local_dict, global_dict, out, order, casting, sanitize, _frame_depth, **kwargs)
    971                  _frame_depth=_frame_depth, sanitize=sanitize, **kwargs)
    972     if e is None:
--> 973         return re_evaluate(local_dict=local_dict, global_dict=global_dict, _frame_depth=_frame_depth)
    974     else:
    975         raise e

[/usr/local/lib/python3.11/dist-packages/numexpr/necompiler.py](https://localhost:8080/#) in re_evaluate(local_dict, global_dict, _frame_depth)
   1003     kwargs = _numexpr_last['kwargs']
   1004     with evaluate_lock:
-> 1005         return compiled_ex(*args, **kwargs)

ValueError: data type must provide an itemsize

It's related to numexpr as the query engine to evaluate a pandas query, in this case on the sample metadata. Usually this is the default and works fine, but is now generating this error for some reason.

Here is a partial workaround:

sample_sets = ["AG1000G-MW"]
sample_query = "taxon = 'arabiensis'"
cyp6aap_region = "2R:28,480,000-28,510,000"
df_cyp6aap_cnv = ag3.gene_cnv_frequencies(
    region=cyp6aap_region,
    cohorts="admin2_year",
    sample_sets=sample_sets,
    sample_query=sample_query,
    sample_query_options=dict(engine="python"),
)
df_cyp6aap_cnv

...although that then generates a different exception (which I'll raise a separate issue about).

Versions:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant