ValueError: node array from the pickle has an incompatible dtype #27

waltergallegog · 2023-09-01T13:31:07Z

Hello,

Using the latest version from bioconda, I'm getting the error

  File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/classify.py", line 238, in classify_by_model
      loaded_model = pickle.load(open(args.model, "rb"))
    File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.__setstate__
    File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
  ValueError: node array from the pickle has an incompatible dtype:
  - expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  - got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

From https://discuss.streamlit.io/t/valueerror-node-array-from-the-pickle-has-an-incompatible-dtype/46682/4 it seems the error could be due to incompatible scikit-learn versions.

The following log seems to agree with this idea:

  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.2.2 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

I installed savana using mamba and the bioconda channel:

savana                    1.0.3              pyhdfd78af_0    bioconda
scikit-learn              1.3.0           py310hf7d194e_0    conda-forg

The current requirement in the bioconda package recepie for savana is:

depends scikit-learn:    >=1.2.2

Perhaps the recipe needs to be updated, or the ont model re-pickled with the new version of scikit-learn.
Thanks
Walter.

Here is the full log:

  Version 1.0.3 - beta
  Source: /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py
  
  Running as sample ERR2752452.merged.sorted.aligned
  Using genome.fa.fai as reference fasta index
  Using multiprocessing with 20 threads

 Submitting 172 "get_potential_breakpoints" tasks to 20 worker threads
  Identified potential breakpoints        6092.765 seconds
  Clustered potential breakpoints         252.014 seconds
  Called consensus breakpoints            208.91 seconds
  Length after: 150775
  Total breakpoints: 150775 (27337 insertions)
  Using 685 as binsize, there are 429 redistributed intervals
  Max binsize 1177, min binsize 1
  Setting maxtasksperchild to 8
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/train.py:37: FutureWarning: Returning a DataFrame from Series.apply when the supplied function returns a Series is deprecated and will be removed in a future version.
    data_matrix[['TUMOUR_DP_0', 'TUMOUR_DP_1']] = data_matrix['TUMOUR_DP'].apply(pd.Series)
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/train.py:38: FutureWarning: Returning a DataFrame from Series.apply when the supplied function returns a Series is deprecated and will be removed in a future version.
    data_matrix[['NORMAL_DP_0', 'NORMAL_DP_1']] = data_matrix['NORMAL_DP'].apply(pd.Series)
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.2.2 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
    warnings.warn(
  Traceback (most recent call last):
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/bin/savana", line 10, in <module>
  Added local depth to breakpoints        5023.865 seconds
  Output consensus breakpoints            118.051 seconds
  Total time to call raw variants         11762.333 seconds
  
  Using ONT somatic only model to classify variants
  First time using model - will untar /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/models/ont-somatic.tar.gz
  Loaded raw breakpoints                  87.857 seconds
      sys.exit(main())
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 303, in main
      args.func(args)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 174, in savana_main
      savana_classify(args)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 117, in savana_classify
      classify.classify_by_model(args, checkpoints, time_str)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/classify.py", line 238, in classify_by_model
      loaded_model = pickle.load(open(args.model, "rb"))
    File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.__setstate__
    File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
  ValueError: node array from the pickle has an incompatible dtype:
  - expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  - got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

The text was updated successfully, but these errors were encountered:

MartinezRuiz-Carlos · 2023-09-05T14:02:54Z

Hi all, exact same issue here

waltergallegog · 2023-09-05T15:59:53Z

@MartinezRuiz-Carlos if it helps, I was able to run savana after downgrading the scikit-learn version to 1.2.2 in my conda env.

MartinezRuiz-Carlos · 2023-09-05T16:11:40Z

Ah nice, I am trying to just run it on a manual install, seems to be doing allright so far, but will give it a go if I get into the same issue, thanks!

helrick · 2023-09-06T09:14:30Z

Hi there, thanks @waltergallegog for the detailed description of this issue. I've opened a pull request on the bioconda-recipes repo that should pin scikit-learn to 1.2.x and prevent conda/mamba from installing 1.3.0 or greater. I'll update here once it's been merged and I've tested that it works correctly.

helrick mentioned this issue Sep 6, 2023

Update savana to 1.0.4 (Resolve errors by pinning scikit-learn to 1.2.x) bioconda/bioconda-recipes#42859

Merged

helrick added a commit that referenced this issue Sep 6, 2023

bumping version for bioconda linter (related to #27)

4ec05b1

helrick added a commit that referenced this issue Sep 6, 2023

bumping version for bioconda linter (related to #27) (#29)

9b65935

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: node array from the pickle has an incompatible dtype #27

ValueError: node array from the pickle has an incompatible dtype #27

waltergallegog commented Sep 1, 2023

MartinezRuiz-Carlos commented Sep 5, 2023

waltergallegog commented Sep 5, 2023

MartinezRuiz-Carlos commented Sep 5, 2023

helrick commented Sep 6, 2023 •

edited

Loading

ValueError: node array from the pickle has an incompatible dtype #27

ValueError: node array from the pickle has an incompatible dtype #27

Comments

waltergallegog commented Sep 1, 2023

MartinezRuiz-Carlos commented Sep 5, 2023

waltergallegog commented Sep 5, 2023

MartinezRuiz-Carlos commented Sep 5, 2023

helrick commented Sep 6, 2023 • edited Loading

helrick commented Sep 6, 2023 •

edited

Loading