You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe this may be the reason why sometimes the tests for multiclass classification fail. I've been struggling with a case where one sample seems to have gone the wrong way in a split because it gets a relatively big probability of being of another class.
The text was updated successfully, but these errors were encountered:
Adding some more info here. This seems to be a sync problem like the one in #4026. The example above only gets the correct number of samples where Column_0 <= 0 (667) on the first iteration, i.e.:
importdask.arrayasdaimportlightgbmaslgbimportnumpyasnpfromdask.distributedimportClientfromsklearn.datasetsimportmake_blobsclient=Client(n_workers=2, threads_per_worker=2)
X, y=make_blobs(n_samples=1_000, centers=[[-4, -4], [4, 4], [-4, 4]])
dX=da.from_array(X, chunks=(100, 2))
dy=da.from_array(y, chunks=100)
clf=lgb.DaskLGBMClassifier(n_estimators=5).fit(dX, dy)
trees_df=clf.booster_.trees_to_dataframe()
trees_df['threshold'] =trees_df['threshold'].astype(np.float64)
# find left childs of the root node when it splits on x1<=0relevant=trees_df.loc[lambdax: (x.node_depth==1) & (x.split_feature=='Column_0') &np.isclose(x.threshold, 0), ['tree_index', 'left_child']]
relevant=relevant.rename(columns={'left_child': 'node_index'})
print(trees_df.merge(relevant)[['tree_index', 'count']].to_markdown())
Description
When using
lgb.DaskLGBMClassifier
with multiclass classification the same split produces different numbers of samples being sent to each child.Reproducible example
Running the exact same thing using
lgb.LGBMClassifier
returns 667 everytime (which is the number of samples withX[:, 0] <= 0
).Environment info
LightGBM version or commit hash: 1e95cb0
Additional Comments
I believe this may be the reason why sometimes the tests for multiclass classification fail. I've been struggling with a case where one sample seems to have gone the wrong way in a split because it gets a relatively big probability of being of another class.
The text was updated successfully, but these errors were encountered: