Overwrite using staging table fails when table has dependencies

When using `overwrite` mode to save data to a table, and also leaving `usestagingtable` to its default value of true, the operation fails with the following error when the target table already has dependencies (e.g. a view depends on the table):

```
java.sql.SQLException: [Amazon](500310) Invalid operation: current transaction is aborted, commands ignored until end of transaction block;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:804)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:642)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1062)
    at com.amazon.redshift.client.PGMessagingContext.getParameterDescription(PGMessagingContext.java:978)
    at com.amazon.redshift.client.PGClient.prepareStatement(PGClient.java:1844)
    at com.amazon.redshift.dataengine.PGQueryExecutor.<init>(PGQueryExecutor.java:106)
    at com.amazon.redshift.dataengine.PGDataEngine.prepare(PGDataEngine.java:211)
    at com.amazon.jdbc.common.SPreparedStatement.<init>(Unknown Source)
    at com.amazon.jdbc.jdbc41.S41PreparedStatement.<init>(Unknown Source)
    at com.amazon.redshift.core.jdbc41.PGJDBC41PreparedStatement.<init>(PGJDBC41PreparedStatement.java:49)
    at com.amazon.redshift.core.jdbc41.PGJDBC41ObjectFactory.createPreparedStatement(PGJDBC41ObjectFactory.java:119)
    at com.amazon.jdbc.common.SConnection.prepareStatement(Unknown Source)
    at com.databricks.spark.redshift.RedshiftWriter.withStagingTable(RedshiftWriter.scala:137)
    at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:369)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: current transaction is aborted, commands ignored until end of transaction block;
    ... 29 more
```

I tracked this error down to the following code in `RedshiftWriter.scala`:

```
    try {
      action(tempTable.toString)

      if (jdbcWrapper.tableExists(conn, table.toString)) {
        jdbcWrapper.executeInterruptibly(conn.prepareStatement(
          s"""
             | BEGIN;
             | ALTER TABLE $table RENAME TO ${backupTable.escapedTableName};
             | ALTER TABLE $tempTable RENAME TO ${table.escapedTableName};
             | DROP TABLE $backupTable;
             | END;
           """.stripMargin.trim))
      } else {
        jdbcWrapper.executeInterruptibly(conn.prepareStatement(
          s"ALTER TABLE $tempTable RENAME TO ${table.escapedTableName}"))
      }
    } finally {
      jdbcWrapper.executeInterruptibly(conn.prepareStatement(s"DROP TABLE IF EXISTS $tempTable"))
    }
```

When trying this transaction manually in SQL Workbench, I get the following error:

```
[Amazon](500310) Invalid operation: cannot drop table myschema.mytable because other objects depend on it;
```

I was hoping that `spark-redshift` would let this error (which is the actual culprit) bubble up when it happens, but instead I get the error I mentioned in the beginning. This is happening because the original exception is masked by another exception that happens due to the `DROP TABLE IF EXISTS`  in the `finally` block, which fails because the transaction is in a bad state at this point, giving the error message `Invalid operation: current transaction is aborted, commands ignored until end of transaction block`.

I'm not sure what the best solution is in this case. I'm open to suggestions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overwrite using staging table fails when table has dependencies #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overwrite using staging table fails when table has dependencies #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions