IndexStatusBuilder outputs incorrect max weight on append the same revision #119

osopardo1 · 2022-07-21T13:57:46Z

What went wrong?

While developing compaction #98 , I noticed some weird behavior on the output of IndexStatus. It turns out that the maxWeight for more than one file was returning a wrong result.

The maxWeight of a cube is composed of the minimum maxWeight of all the files belonging to that cube. Instead, in some occasions, the output was the maximum or other number.

Example:

Cube 1
File 1, maxWeight = 0.5
File 2, maxWeight = 0.7

Cube 1 maxWeight = 0.7 instead of Cube 1 maxWeight = 0.5

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

You can reproduce it with the following test:

  val data = 0.to(100000).toDF("id")

  // Append data x times
  data.write.format("qbeast").option("columnsToIndex", "id").option("cubeSize", "10000").save("path")
  val deltaLog = DeltaLog.forTable(spark, tmpDir)
  val firstIndexStatus = DeltaQbeastSnapshot(deltaLog.snapshot).loadLatestIndexStatus
  data.write.format("qbeast").mode("append").option("columnsToIndex", "id").option("cubeSize", "10000").save("path")
  val secondIndexStatus = DeltaQbeastSnapshot(deltaLog.update()).loadLatestIndexStatus

  secondIndexStatus.cubesStatuses.foreach { case (cube: CubeId, cubeStatus: CubeStatus) =>
    if (cubeStatus.maxWeight < Weight.MaxValue) {
      cubeStatus.maxWeight shouldBe <=(firstIndexStatus.cubesStatuses(cube).maxWeight)
    }
  }

2. Branch and commit id:

main commit 9a17df8

3. Spark version:

On the spark shell run spark.version.

3.1.2

4. Hadoop version:

On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.2.0

5. How are you running Spark?

Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?

Local

6. Stack trace:

Trace of the log/error messages.

Weight(-1726009150) was not equal to Weight(-1936746399)

The text was updated successfully, but these errors were encountered:

osopardo1 added the type: bug Something isn't working label Jul 21, 2022

osopardo1 mentioned this issue Jul 21, 2022

Fixing bug with IndexStatusBuilder #120

Merged

3 tasks

osopardo1 closed this as completed in 0d31627 Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexStatusBuilder outputs incorrect max weight on append the same revision #119

IndexStatusBuilder outputs incorrect max weight on append the same revision #119

osopardo1 commented Jul 21, 2022 •

edited

Loading

IndexStatusBuilder outputs incorrect max weight on append the same revision #119

IndexStatusBuilder outputs incorrect max weight on append the same revision #119

Comments

osopardo1 commented Jul 21, 2022 • edited Loading

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace:

osopardo1 commented Jul 21, 2022 •

edited

Loading