Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexStatusBuilder outputs incorrect max weight on append the same revision #119

Closed
osopardo1 opened this issue Jul 21, 2022 · 0 comments
Closed
Labels
type: bug Something isn't working

Comments

@osopardo1
Copy link
Member

osopardo1 commented Jul 21, 2022

What went wrong?

While developing compaction #98 , I noticed some weird behavior on the output of IndexStatus. It turns out that the maxWeight for more than one file was returning a wrong result.

The maxWeight of a cube is composed of the minimum maxWeight of all the files belonging to that cube. Instead, in some occasions, the output was the maximum or other number.

Example:

Cube 1
File 1, maxWeight = 0.5
File 2, maxWeight = 0.7

Cube 1 maxWeight = 0.7 instead of Cube 1 maxWeight = 0.5 

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

You can reproduce it with the following test:

  val data = 0.to(100000).toDF("id")

  // Append data x times
  data.write.format("qbeast").option("columnsToIndex", "id").option("cubeSize", "10000").save("path")
  val deltaLog = DeltaLog.forTable(spark, tmpDir)
  val firstIndexStatus = DeltaQbeastSnapshot(deltaLog.snapshot).loadLatestIndexStatus
  data.write.format("qbeast").mode("append").option("columnsToIndex", "id").option("cubeSize", "10000").save("path")
  val secondIndexStatus = DeltaQbeastSnapshot(deltaLog.update()).loadLatestIndexStatus

  secondIndexStatus.cubesStatuses.foreach { case (cube: CubeId, cubeStatus: CubeStatus) =>
    if (cubeStatus.maxWeight < Weight.MaxValue) {
      cubeStatus.maxWeight shouldBe <=(firstIndexStatus.cubesStatuses(cube).maxWeight)
    }
  }

2. Branch and commit id:

main commit 9a17df8

3. Spark version:

On the spark shell run spark.version.

3.1.2

4. Hadoop version:

On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.2.0

5. How are you running Spark?

Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?

Local

6. Stack trace:

Trace of the log/error messages.

Weight(-1726009150) was not equal to Weight(-1936746399)

@osopardo1 osopardo1 added the type: bug Something isn't working label Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant