Skip to content

Commit

Permalink
add warning to SparkBoostLGBM for low volume dataset under streaming …
Browse files Browse the repository at this point in the history
…execution mode
  • Loading branch information
fonhorst committed Aug 2, 2023
1 parent 16cdab4 commit 8beea57
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions sparklightautoml/ml_algo/boost_lgbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,12 @@ def fit_predict_single_fold(
logger.info(f"Use single dataset mode: {lgbm.getUseSingleDatasetMode()}. NumThreads: {lgbm.getNumThreads()}")
logger.info(f"All lgbm booster params: {run_params}")

if (run_params["executionMode"] == "streaming") and (full_data.count() <= 25_000):
warnings.warn(f"The fitting of lightgbm in streaming execution mode "
f"may fail with SEGSIGV / SIGBUS error (probably due to a bug in synapse ml) "
f"if too few data available per core. "
f"Consider switching to bulk execution mode if such crashes happen", RuntimeWarning)

# fitting the model
ml_model = lgbm.fit(self._assembler.transform(full_data))

Expand Down

0 comments on commit 8beea57

Please sign in to comment.