You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am using spark (actually pyspark) with sqlite JDBC for tests, and I run into case that causes "java.sql.SQLException: column -1 out of bounds" exception. It only happens with sqlite-jdbc-3.39.4.0 or newer, older versions work just fine - this makes me think that this might be an issue in the JDBC driver itself. Unfortunately, the issue is generated indirectly through spark, I don't know exactly what it does with sqlite. Still, I tried to isolate a minimal example that reproduces the issue.
To Reproduce
Here is a sample code from Python 3.10 (as mentioned I reproduce the issue indirectly through pyspark, I don't actually use Java directly):
24/12/20 04:02:49 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.sql.SQLException: column -1 out of bounds [1,2]
at org.sqlite.core.CoreResultSet.checkCol(CoreResultSet.java:98)
at org.sqlite.core.CoreResultSet.markCol(CoreResultSet.java:112)
at org.sqlite.jdbc3.JDBC3ResultSet.wasNull(JDBC3ResultSet.java:150)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:359)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:340)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Environment:
OS: Windows 10
CPU architecture: x86_64
sqlite-jdbc version: 3.47.1.0 (issue seems to be introduced in 3.39.4.0)
Additional context
the issue might not happen in 100% runs, occasionally it just works - most of the time it throws the exception though
if we exchange the order of another_bool and t1_id in the second query, the issue is gone
if we use sqlite-jdbc-3.39.3.0 or older, the issue is gone
The text was updated successfully, but these errors were encountered:
Difficult to help while not knowing how Spark uses the driver. It stumbles on this method, and the documentation says:
Note that you must first call one of the getter methods on a column to try to read its value and then call the method wasNull to see if the value read was SQL NULL.
And:
Throws: SQLException - if a database access error occurs or this method is called on a closed result set
It may be some incorrect usage of the driver by Spark, as JDBC is quite complex and often left to interpretation when it comes to implementation, or it could be some issue in the driver.
There's nothing obvious here that would make me think the driver is at fault, as we have unit tests covering those methods.
I suggest you raise an issue on the Spark repo/community, they may find something on their side, or inquire here if our driver seem at fault.
Describe the bug
I am using spark (actually pyspark) with sqlite JDBC for tests, and I run into case that causes "java.sql.SQLException: column -1 out of bounds" exception. It only happens with sqlite-jdbc-3.39.4.0 or newer, older versions work just fine - this makes me think that this might be an issue in the JDBC driver itself. Unfortunately, the issue is generated indirectly through spark, I don't know exactly what it does with sqlite. Still, I tried to isolate a minimal example that reproduces the issue.
To Reproduce
Here is a sample code from Python 3.10 (as mentioned I reproduce the issue indirectly through pyspark, I don't actually use Java directly):
Expected behavior
I don't expect any exception.
Logs
Here is one such stracktrace:
Environment:
Additional context
The text was updated successfully, but these errors were encountered: