Skip to content

Releases: GoogleCloudDataproc/spark-bigquery-connector

0.36.0

25 Jan 23:34
Compare
Choose a tag to compare
  • PR #1155: allow lazy materialization of query on load
  • PR #1163: Added config to set the BigQuery Job timeout
  • PR #1166: Fix filters by adding surrounding parenthesis. Thanks @tom-s-powell !
  • PR #1171: fix read, write issues with Timestamp
  • Issue #1116: BigQuery write fails with MessageSize is too large
  • BigQuery API has been upgraded to version 2.36.0
  • GAX has been upgraded to version 2.40.0
  • gRPC has been upgraded to version 1.61.0
  • Netty has been upgraded to version 4.1.106.Final
  • Protocol Buffers has been upgraded to version 3.25.2

0.35.1

29 Dec 10:34
Compare
Choose a tag to compare
  • PR #1153: allow writing spark string to BQ datetime

0.35.0

20 Dec 16:06
Compare
Choose a tag to compare
  • PR #1115: Added new connector, spark-3.5-bigquery aimed to be used in Spark 3.5. This connector implements new APIs and capabilities provided by the Spark Data Source V2 API.
  • PR #1117: Make read session caching duration configurable
  • PR #1118: Improve read session caching key
  • PR #1122: Set traceId on write
  • PR #1124: Added SparkListenerEvents for Query and Load jobs running on BigQuery
  • PR #1127: Fix job labeling for mixed case Dataproc job names
  • PR #1136: Consider projections for biglake stats
  • PR #1143: Enable async write for default stream
  • BigQuery API has been upgraded to version 2.35.0
  • BigQuery Storage API has been upgraded to version 2.47.0
  • GAX has been upgraded to version 2.38.0
  • gRPC has been upgraded to version 1.60.0
  • Netty has been upgraded to version 4.1.101.Final
  • Protocol Buffers has been upgraded to version 3.25.1

0.34.0

31 Oct 21:26
Compare
Choose a tag to compare
  • PR #1057: Enable async writes for greater throughput
  • PR #1094: CVE-2023-5072: Upgrading the org.json:json dependency
  • PR #1095: CVE-2023-4586: Upgrading the netty dependencies
  • PR #1104: Fixed nested field predicate pushdown
  • PR #1109: Enable read session caching by default for faster Spark planning
  • PR #1111: Enable retry of failed messages
  • Issue #103: Support for Dynamic partition overwrite for time and range partitioned table
  • Issue #1099: Fixing the usage of ExternalAccountCredentials
  • BigQuery API has been upgraded to version 2.33.2
  • BigQuery Storage API has been upgraded to version 2.44.0
  • GAX has been upgraded to version 2.35.0
  • gRPC has been upgraded to version 1.58.0
  • Protocol Buffers has been upgraded to version 3.24.4

0.33.0

17 Oct 23:24
Compare
Choose a tag to compare
  • Added new connector, spark-3.4-bigquery aimed to be used in Spark 3.4 and above. This connector implements new APIs and capabilities provided by the Spark Data Source V2 API.
  • PR #1008: Adding support to expose BigQuery metrics using Spark custom metrics API.
  • PR #1038: Logical plan now shows the BigQuery table of DirectBigQueryRelation. Thanks @idc101 !
  • PR #1058: View names will appear in query plan instead of the materialized table
  • PR #1061: Handle NPE case when reading BQ table with NUMERIC fields. Thanks @hayssams !
  • PR #1069: Support TimestampNTZ datatype in spark 3.4
  • Issue #453: fix comment handling in query
  • Issue #144: allow writing Spark String to BQ TIME type
  • Issue #867: Support writing with RangePartitioning
  • Issue #1046: Add a way to disable map type support
  • Issue #1062: Adding dataproc job ID and UUID labels to BigQuery jobs

0.32.2

07 Aug 18:35
Compare
Choose a tag to compare

0.32.1

04 Aug 02:16
Compare
Choose a tag to compare
  • PR #1025: Handle Java 8 types for dates and timestamps when compiling filters. Thanks @tom-s-powell !
  • Issue #1026: Fixing Numeric conversion
  • Issue #1028: Fixing PolicyTags removal on overwrite

0.32.0

18 Jul 16:28
Compare
Choose a tag to compare
  • Issue #748: _PARTITIONDATE pseudo column is provided only for ingestion time daily partitioned tables
  • Issue #990: Fix to support allowFieldAddition for columns with nested fields.
  • Issue #993: Spark ML vector read and write fails
  • PR #1007: Implement at-least-once option that utilizes default stream

0.31.1

06 Jun 20:31
Compare
Choose a tag to compare
  • Issue #988: Read statistics are logged at TRACE level. Update the log4j configuration accordingly in order to log them.

0.31.0

02 Jun 15:23
Compare
Choose a tag to compare
  • ⚠️ Breaking Change BigNumeric conversion has changed, and it is now converted to Spark's
    Decimal data type. Notice that BigNumeric can have a wider precision than Decimal, so additional
    setting may be needed. See here
    for additional details.
  • Issue #945: Fixing unable to add new column even with option allowFieldAddition
  • PR #965: Fix to reuse the same BigQueryClient for the same BigQueryConfig, rather than creating a new one
  • PR #950: Added support for service account impersonation
  • PR #960: Added support for basic configuration of the gRPC channel pool size in the BigQueryReadClient.
  • PR #973: Added support for writing to CMEK managed tables.
  • PR #971: Fixing wrong results or schema error when Spark nested schema pruning is on for datasource v2
  • PR #974: Applying DPP to Hive partitioned BigLake tables (spark-3.2-bigquery and spark-3.3-bigquery only)
  • PR #986: CVE-2020-8908, CVE-2023-2976: Upgrading Guava to version 32.0-jre
  • BigQuery API has been upgraded to version 2.26.0
  • BigQuery Storage API has been upgraded to version 2.36.1
  • GAX has been upgraded to version 2.26.0
  • gRPC has been upgraded to version 1.55.1
  • Netty has been upgraded to version 4.1.92.Final
  • Protocol Buffers has been upgraded to version 3.23.0
  • PR #957: support direct write with subset field list.