Error on sampling when using <columnName>:<type> in columnsToIndex #352

osopardo1 · 2024-07-23T14:22:02Z

What went wrong?

When doing a tablesample over a table with histogram type column, the following error appears:


None.get
java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at io.qbeast.spark.internal.rules.SampleRule.$anonfun$transformSampleToFilter$1(SampleRule.scala:77)

Due to the bad parsing of columnsToIndex property in :

qbeast-spark/src/main/scala/io/qbeast/spark/internal/rules/SampleRule.scala

Lines 113 to 125 in 9b47ef5

    
           object QbeastRelation { 
        
             def unapply(plan: LogicalPlan): Option[(LogicalRelation, IndexedColumns)] = plan match { 
        
               case l @ LogicalRelation( 
        
                     q @ HadoopFsRelation(o: DefaultFileIndex, _, _, _, _, parameters), 
        
                     _, 
        
                     _, 
        
                     _) => 
        
                 val columnsToIndex = parameters("columnsToIndex") 
        
                 Some((l, columnsToIndex.split(","))) 
        
               case _ => None 
        
             }

How to reproduce?

Different steps about how to reproduce the problem.

1. Code that triggered the bug, or steps to reproduce:

val data = 1.to(10).map(i => (i, s"$i")).toDF("id", "name")
data.write
  .format("qbeast")
  .option("columnsToIndex", "id,name:histogram")
  .option("cubeSize", "100")
  .option("columnStats", s"""{"id_min": 0, "id_max": 100}""")
  .saveAsTable("qbeast")

spark.sql("SELECT * FROM qbeast TABLESAMPLE(10 PERCENT)").show(false)

2. Branch and commit id:

main at 9b47ef5

3. Spark version:

On the spark shell run spark.version.

3.5.0

4. Hadoop version:

On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.3.4

5. How are you running Spark?

Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?

Local

6. Stack trace:

Trace of the log/error messages.

The text was updated successfully, but these errors were encountered:

osopardo1 added the type: bug Something isn't working label Jul 23, 2024

osopardo1 self-assigned this Jul 24, 2024

osopardo1 mentioned this issue Jul 24, 2024

Issue 352: Sampling Error when explicit type is set in columnsToIndex #355

Merged

6 tasks

Jiaweihu08 closed this as completed in #355 Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on sampling when using <columnName>:<type> in columnsToIndex #352

Error on sampling when using <columnName>:<type> in columnsToIndex #352

osopardo1 commented Jul 23, 2024 •

edited

Loading

Error on sampling when using <columnName>:<type> in columnsToIndex #352

Error on sampling when using <columnName>:<type> in columnsToIndex #352

Comments

osopardo1 commented Jul 23, 2024 • edited Loading

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace:

osopardo1 commented Jul 23, 2024 •

edited

Loading