spark-snowflake connector does not cancel running queries if the DataFrame command was cancelled in long-lived Spark applications #519

sadikovi · 2023-08-03T03:58:38Z

I have observed that spark-snowflake connector never cancels queries if the corresponding DataFrame command was cancelled in applications such as Spark shell or any interactive environment such as notebooks.

Although there is a listener implemented to cancel all queries at the end of the application, this may not happen for many hours or days, and those queries would continue to run regardless. SparkConnectorContext.removeRunningQuery only removes the query metadata from the global tracking map but does not actually cancel the query - one would have to log in and cancel queries manually. I have an example where a query would run for hours because of this limitation.

I also noticed that thread interruption does not seem to work properly for ResultSet in the connector, particularly this block of code:

val objects = asyncRs
  .asInstanceOf[SnowflakeResultSet]
  .getResultSetSerializables(params.expectedPartitionSize)

and does not allow query cancellation on thread interrupt signal. You may need to wrap it into Await block or a separate thread to allow interrupts.

The text was updated successfully, but these errors were encountered:

kleinux · 2023-09-18T20:32:40Z

I experience this issue too. Cancelling a query from spark does not cancel the query in snowflake. This can lead to very expensive operations that are not expected to execute.

sadikovi · 2023-09-21T20:26:07Z

Also, this is not specific to a particular version. I can reproduce the same issue in Snowflake 2.12.0+ and the latest master. The code needs to be refactored to handle failures and query cancellation.

sadikovi · 2024-01-17T20:48:08Z

Current issues:

Cancel queries in an interactive manner, we should not wait for the job to finish to cancel queries (this could take hours).
Fix JDBC driver to allow cancellation of async queries, currently it is hard to do.
Spark problem: py4j does not track JVM threads, i.e. socket close in PVM != socket close in JVM - I will fix it.

Code for 1:

private[snowflake] def cancelRunningQuery(sparkContext: SparkContext, queryID: String): Unit = {
  withSyncAndDoNotThrowException {
    val appId = sparkContext.applicationId
    val queries = runningQueries.get(appId)
    val candidates = queries.get.filter(rq => rq.queryID == queryID)
    logger.info(s"Running queries: $appId, $queries, trying to find $queryID, " +
      s"found queries: ${queries}, candidates: ${candidates}")
    if (candidates.nonEmpty) {
      candidates.foreach(rq => try {
        if (!rq.conn.isClosed) {
          val statement = rq.conn.createStatement()
          val sessionID = rq.conn.getSessionID
          logger.warn(s"Canceling query ${rq.queryID} for session: $sessionID")
          statement.execute(s"select SYSTEM$$CANCEL_QUERY('${rq.queryID}')")
          statement.close()
        }
      } catch {
        case th: Throwable =>
          logger.warn("Fail to cancel running queries: ", th)
      })
      logger.warn(s"Finish cancelling all queries for $appId")
      runningQueries.remove(appId)
    } else {
      logger.error(s"No running query for: $appId and $queryID")
    }
  }
}

Code for 2

wrapperThread = new WrapperThread("Snowflake-Async-Query") {
  private var objects: java.util.List[net.snowflake.client.jdbc.SnowflakeResultSetSerializable] = null
  private var err: Throwable = null

  override def run(): Unit = {
    objects = asyncRs
      .asInstanceOf[SnowflakeResultSet]
      .getResultSetSerializables(params.expectedPartitionSize)
  }

  override val getQueryID: String = queryID
  def getObjects: java.util.List[net.snowflake.client.jdbc.SnowflakeResultSetSerializable] = objects
  def setErr(e: Throwable) = this.err = e
  def getErr(): Throwable = this.err
}

wrapperThread.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
  override def uncaughtException(t: Thread, e: Throwable): Unit = {
    if (t.isInstanceOf[WrapperThread]) {
      val thread = t.asInstanceOf[WrapperThread]
      thread.setErr(e)
      thread.interrupt()
    }
  }
})

wrapperThread.start()
wrapperThread.join()

Then you would cancel like this:

if (wrapperThread != null) {
  SparkConnectorContext.cancelRunningQuery(sqlContext.sparkContext, wrapperThread.getQueryID)
  SparkConnectorContext.removeRunningQuery(sqlContext.sparkContext, conn, wrapperThread.getQueryID)
  // wrapperThread.interrupt()
  wrapperThread.join()
  wrapperThread = null
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-snowflake connector does not cancel running queries if the DataFrame command was cancelled in long-lived Spark applications #519

spark-snowflake connector does not cancel running queries if the DataFrame command was cancelled in long-lived Spark applications #519

sadikovi commented Aug 3, 2023

kleinux commented Sep 18, 2023

sadikovi commented Sep 21, 2023

sadikovi commented Jan 17, 2024 •

edited

Loading

spark-snowflake connector does not cancel running queries if the DataFrame command was cancelled in long-lived Spark applications #519

spark-snowflake connector does not cancel running queries if the DataFrame command was cancelled in long-lived Spark applications #519

Comments

sadikovi commented Aug 3, 2023

kleinux commented Sep 18, 2023

sadikovi commented Sep 21, 2023

sadikovi commented Jan 17, 2024 • edited Loading

sadikovi commented Jan 17, 2024 •

edited

Loading