You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Happens when training on batched data with warm_start = True and the data is unbalanced.
Error:
/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: **Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes)** is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order, subok=True)
Traceback (most recent call last):
ml-pipeline/src/treeint_simple_example.py", line 22, in <module>
test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))
File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 212, in predict
return _predict_forest(model, X, joint_contribution=joint_contribution)
File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 166, in _predict_forest
return (np.mean(predictions, axis=0), np.mean(biases, axis=0),
File "<__array_function__ internals>", line 6, in mean
File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3373, in mean
out=out, **kwargs)
File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 144, in _mean
arr = asanyarray(a)
File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 136, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
**ValueError: could not broadcast input array from shape (2,1) into shape (2)**
Reproduction:
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import pandas as pd
# Random forest that can train on chunks of data.
rf = RandomForestClassifier(warm_start=True, n_estimators=1)
# data of chunk1
chunk1_data_vec = [0, 0]
chunk1_df = pd.DataFrame(data={'label': chunk1_data_vec, 'features1': chunk1_data_vec, 'features2': chunk1_data_vec})
# data of chunk2
chunk2_data_vec = [0, 0, 1, 1, 0, 0, 1, 1]
chunk2_df = pd.DataFrame(data={'label': chunk2_data_vec, 'features1': chunk2_data_vec, 'features2': chunk2_data_vec})
# fit first chunk of data that has a single label
rf.fit(X=chunk1_df.drop(['label'], axis='columns'), y=chunk1_df['label'])
# fit second chunk of data that has 2 labels
rf.n_estimators += 1
rf.fit(X=chunk2_df.drop(['label'], axis='columns'), y=chunk2_df['label'])
# test
test_data = chunk2_df.drop(['label'], axis='columns')
# regular predict
rf.predict_proba(test_data)
# tree interpreter predict
test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))
The text was updated successfully, but these errors were encountered:
Happens when training on batched data with warm_start = True and the data is unbalanced.
Error:
Reproduction:
The text was updated successfully, but these errors were encountered: