Failed to update table metadata when writing to a partitioned table #383

bskim45 · 2021-04-30T18:08:46Z

Hi, I recently updated version to 0.20.0, but getting following errors when I try to write to a partitioned table.

example code:

df
    .write
    .format("bigquery")
    .option("datePartition", "20210430")
    .option("partitionType", "DAY")
    .option("createDisposition", "CREATE_IF_NEEDED")
    .option("intermediateFormat", "orc")
    .mode("overwrite")
    .save("awesome_project.dataset_name.table_name")

error:

Exception in thread "main" java.lang.RuntimeException: Failed to write to BigQuery
	at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:94)
	at com.google.cloud.spark.bigquery.BigQueryInsertableRelation.insert(BigQueryInsertableRelation.scala:43)
	at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:113)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
    ...REDACT...
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Invalid table ID "tablename$20210430". Table IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot be used.
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.patch(HttpBigQueryRpc.java:272)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl$14.call(BigQueryImpl.java:629)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl$14.call(BigQueryImpl.java:626)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl.update(BigQueryImpl.java:625)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.connector.common.BigQueryClient.update(BigQueryClient.java:165)
	at com.google.cloud.spark.bigquery.BigQueryWriteHelper.updateMetadataIfNeeded(BigQueryWriteHelper.scala:216)
	at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:92)
	... 52 more
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
POST https://www.googleapis.com/bigquery/v2/projects/awesome_project/datasets/dataset_name/tables/tablename$20210430?prettyPrint=false
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Invalid table ID \"tablename$20210430\". Table IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot be used.",
    "reason" : "invalid"
  } ],
  "message" : "Invalid table ID \"tablename$20210430\". Table IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot be used.",
  "status" : "INVALID_ARGUMENT"
}
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:149)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:112)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:39)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:443)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:591)
	at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.patch(HttpBigQueryRpc.java:270)
	... 61 more

The text was updated successfully, but these errors were encountered:

bskim45 · 2021-04-30T18:34:15Z

Ok. After some digging, I found that it may not related to recent changes. (I'm not sure of this either)

spark-bigquery-connector/connector/src/main/scala/com/google/cloud/spark/bigquery/BigQueryWriteHelper.scala

Lines 194 to 218 in d06a34f

 def updateMetadataIfNeeded: Unit = { 

 val fieldsToUpdate = data.schema 

 .filter { 

 field => 

 SupportedCustomDataType.of(field.dataType).isPresent || 

 getDescriptionOrCommentOfField(field).isPresent} 

 .map (field => (field.name, field)) 

 .toMap 

 if (!fieldsToUpdate.isEmpty) { 

 logDebug(s"updating schema, found fields to update: ${fieldsToUpdate.keySet}") 

 val originalTableInfo = bigQueryClient.getTable(options.getTableId) 

 val originalTableDefinition = originalTableInfo.getDefinition[TableDefinition] 

 val originalSchema = originalTableDefinition.getSchema 

 val updatedSchema = Schema.of(originalSchema.getFields.asScala.map(field => { 

 fieldsToUpdate.get(field.getName) 

 .map(dataField => updatedField(field, dataField)) 

 .getOrElse(field) 

 }).asJava) 

 val updatedTableInfo = originalTableInfo.toBuilder.setDefinition( 

 originalTableDefinition.toBuilder.setSchema(updatedSchema).build 

 ) 

 bigQueryClient.update(updatedTableInfo.build) 

 } 

 }

In updateMetadataIfNeeded, SparkBigQueryConfig.tableId is used to reference the table to update a table metadata, but it's already suffixed with the table decorator if datePartition is specified.

…GoogleCloudDataproc#383)

… table (#385)

davidrabinowitz self-assigned this Apr 30, 2021

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue May 2, 2021

Fixed table metadata update when writing to a partitioned table (Issue …

1a42c8b

…GoogleCloudDataproc#383)

davidrabinowitz added a commit that referenced this issue May 4, 2021

Issue #383: Fixed table metadata update when writing to a partitioned…

0a59cdd

… table (#385)

davidrabinowitz closed this as completed May 4, 2021

yeshvantbhavnasi mentioned this issue Mar 14, 2022

BigQuery direct write API to a partitioned table failing #560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to update table metadata when writing to a partitioned table #383

Failed to update table metadata when writing to a partitioned table #383

bskim45 commented Apr 30, 2021

bskim45 commented Apr 30, 2021

Failed to update table metadata when writing to a partitioned table #383

Failed to update table metadata when writing to a partitioned table #383

Comments

bskim45 commented Apr 30, 2021

bskim45 commented Apr 30, 2021