Match Error on Filtering indexed String columns #58

osopardo1 · 2021-12-22T09:41:45Z

What went wrong?
When doing a query on a string indexed column, the Spark type UTF8String is not recognized by the Transformation method and it throws a Match error.

This is the result of filtering the e-commerce dataset indexed with qbeast by "brand == 'versace'"

versace (of class org.apache.spark.unsafe.types.UTF8String)
scala.MatchError: versace (of class org.apache.spark.unsafe.types.UTF8String)
	at io.qbeast.core.transform.HashTransformation.transform(HashTransformation.scala:11)
	at io.qbeast.core.model.QuerySpaceFromTo$.$anonfun$apply$1(QuerySpace.scala:68)
	at scala.collection.immutable.List.map(List.scala:293)
	at io.qbeast.core.model.QuerySpaceFromTo$.apply(QuerySpace.scala:67)
	at io.qbeast.spark.index.query.QuerySpecBuilder.extractQuerySpace(QuerySpecBuilder.scala:107)
	at io.qbeast.spark.index.query.QuerySpecBuilder.build(QuerySpecBuilder.scala:144)
	at io.qbeast.spark.index.query.QueryExecutor.$anonfun$execute$1(QueryExecutor.scala:22)

The solution is to detect the spark type before calling core functions and parse it to the string representation.

How to reproduce?

Code that triggered the bug, or steps to reproduce:

   val tmpDir = "/tmp/qbeast"

    val data = spark.read
    .format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("src/test/resources/ecommerce100K_2019_Oct.csv")
    .distinct().na.drop()

   data.write
      .mode("overwrite")
      .format("qbeast")
      .options(
        Map("columnsToIndex" -> "brand,product_id", "cubeSize" -> "10000"))
      .save(tmpDir)

  val indexed = spark.read.format("qbeast").load(tmpDir)
  indexed.filter("brand == 'versace'").show()

Branch and commit id:

main on c182980
3. Spark version:
On the spark shell run spark.version.

3.1.2

Hadoop version:
On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.2.0

Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?

On local computer

Stack trace:

The text was updated successfully, but these errors were encountered:

eavilaes · 2021-12-22T13:39:49Z

Closing per #59

osopardo1 added the type: bug Something isn't working label Dec 22, 2021

osopardo1 mentioned this issue Dec 22, 2021

Match error on filtering string indexed columns #59

Merged

osopardo1 added the high label Dec 22, 2021

osopardo1 self-assigned this Dec 22, 2021

eavilaes closed this as completed Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match Error on Filtering indexed String columns #58

Match Error on Filtering indexed String columns #58

osopardo1 commented Dec 22, 2021

eavilaes commented Dec 22, 2021

Match Error on Filtering indexed String columns #58

Match Error on Filtering indexed String columns #58

Comments

osopardo1 commented Dec 22, 2021

eavilaes commented Dec 22, 2021