Skip to content

Training LightGBMRanker several times gives different NDCG on testing set #580

@daureg

Description

@daureg

I noticed that when training on Databricks with the same parameters on the same data several times, the resulting models don't give the same predictions, as evidenced by different NDCG on a separate testing set.
Here is my training function, my training set has 400K exemples in 5K lists, with 60 features:

def train(): Unit = {
  val lgbm = new LightGBMRanker()
  .setCategoricalSlotIndexes(Array(0, 2, 3, 4, 6, 7, 8, 59))
  .setFeaturesCol("features")
  .setGroupCol("query_id")
  .setLabelCol("label")
  .setMaxPosition(10)
  .setParallelism("voting")
  .setNumIterations(15)
  .setMaxDepth(4)
  .setNumLeaves(12)
  val training = table(s"training")
  val model = lgbm.fit(training)
}

Is that inherent to distributed training (on 5 executors) or should I change some parameters of my LightGBMRanker instance?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions