Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: node array from the pickle has an incompatible dtype #27

Open
waltergallegog opened this issue Sep 1, 2023 · 4 comments
Open

Comments

@waltergallegog
Copy link

Hello,

Using the latest version from bioconda, I'm getting the error

  File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/classify.py", line 238, in classify_by_model
      loaded_model = pickle.load(open(args.model, "rb"))
    File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.__setstate__
    File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
  ValueError: node array from the pickle has an incompatible dtype:
  - expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  - got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

From https://discuss.streamlit.io/t/valueerror-node-array-from-the-pickle-has-an-incompatible-dtype/46682/4 it seems the error could be due to incompatible scikit-learn versions.

The following log seems to agree with this idea:

  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.2.2 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

I installed savana using mamba and the bioconda channel:

savana                    1.0.3              pyhdfd78af_0    bioconda
scikit-learn              1.3.0           py310hf7d194e_0    conda-forg

The current requirement in the bioconda package recepie for savana is:

depends scikit-learn:    >=1.2.2

Perhaps the recipe needs to be updated, or the ont model re-pickled with the new version of scikit-learn.
Thanks
Walter.

Here is the full log:

  Version 1.0.3 - beta
  Source: /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py
  
  Running as sample ERR2752452.merged.sorted.aligned
  Using genome.fa.fai as reference fasta index
  Using multiprocessing with 20 threads

 Submitting 172 "get_potential_breakpoints" tasks to 20 worker threads
  Identified potential breakpoints        6092.765 seconds
  Clustered potential breakpoints         252.014 seconds
  Called consensus breakpoints            208.91 seconds
  Length after: 150775
  Total breakpoints: 150775 (27337 insertions)
  Using 685 as binsize, there are 429 redistributed intervals
  Max binsize 1177, min binsize 1
  Setting maxtasksperchild to 8
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/train.py:37: FutureWarning: Returning a DataFrame from Series.apply when the supplied function returns a Series is deprecated and will be removed in a future version.
    data_matrix[['TUMOUR_DP_0', 'TUMOUR_DP_1']] = data_matrix['TUMOUR_DP'].apply(pd.Series)
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/train.py:38: FutureWarning: Returning a DataFrame from Series.apply when the supplied function returns a Series is deprecated and will be removed in a future version.
    data_matrix[['NORMAL_DP_0', 'NORMAL_DP_1']] = data_matrix['NORMAL_DP'].apply(pd.Series)
  /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.2.2 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
    warnings.warn(
  Traceback (most recent call last):
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/bin/savana", line 10, in <module>
  Added local depth to breakpoints        5023.865 seconds
  Output consensus breakpoints            118.051 seconds
  Total time to call raw variants         11762.333 seconds
  
  Using ONT somatic only model to classify variants
  First time using model - will untar /mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/models/ont-somatic.tar.gz
  Loaded raw breakpoints                  87.857 seconds
      sys.exit(main())
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 303, in main
      args.func(args)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 174, in savana_main
      savana_classify(args)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/savana.py", line 117, in savana_classify
      classify.classify_by_model(args, checkpoints, time_str)
    File "/mnt/trcanmed/wgallego/pipesomatic/work/conda/env-8bb8117cce1d50927eeef0fd0720738e/lib/python3.10/site-packages/savana/classify.py", line 238, in classify_by_model
      loaded_model = pickle.load(open(args.model, "rb"))
    File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.__setstate__
    File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
  ValueError: node array from the pickle has an incompatible dtype:
  - expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  - got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

@MartinezRuiz-Carlos
Copy link

Hi all, exact same issue here

@waltergallegog
Copy link
Author

@MartinezRuiz-Carlos if it helps, I was able to run savana after downgrading the scikit-learn version to 1.2.2 in my conda env.

@MartinezRuiz-Carlos
Copy link

Ah nice, I am trying to just run it on a manual install, seems to be doing allright so far, but will give it a go if I get into the same issue, thanks!

@helrick
Copy link
Member

helrick commented Sep 6, 2023

Hi there, thanks @waltergallegog for the detailed description of this issue. I've opened a pull request on the bioconda-recipes repo that should pin scikit-learn to 1.2.x and prevent conda/mamba from installing 1.3.0 or greater. I'll update here once it's been merged and I've tested that it works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants