Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Spark 3.1.1 with testing #349

Merged
merged 24 commits into from
Jun 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 1 addition & 18 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,29 +33,12 @@ jobs:
DBT_INVOCATION_ENV: circle
docker:
- image: fishtownanalytics/test-container:10
- image: godatadriven/spark:2
- image: godatadriven/spark:3.1.1
environment:
WAIT_FOR: localhost:5432
command: >
--class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
--name Thrift JDBC/ODBC Server
--conf spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://localhost/metastore
--conf spark.hadoop.javax.jdo.option.ConnectionUserName=dbt
--conf spark.hadoop.javax.jdo.option.ConnectionPassword=dbt
--conf spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.jars.packages=org.apache.hudi:hudi-spark-bundle_2.11:0.9.0
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
--conf spark.driver.userClassPathFirst=true
--conf spark.hadoop.datanucleus.autoCreateTables=true
--conf spark.hadoop.datanucleus.schema.autoCreateTables=true
--conf spark.hadoop.datanucleus.fixedDatastore=false
--conf spark.sql.hive.convertMetastoreParquet=false
--hiveconf hoodie.datasource.hive_sync.use_jdbc=false
--hiveconf hoodie.datasource.hive_sync.mode=hms
--hiveconf datanucleus.schema.autoCreateAll=true
--hiveconf hive.metastore.schema.verification=false

- image: postgres:9.6.17-alpine
environment:
POSTGRES_USER: dbt
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Features
- Add session connection method ([#272](https://github.com/dbt-labs/dbt-spark/issues/272), [#279](https://github.com/dbt-labs/dbt-spark/pull/279))
- rename file to match reference to dbt-core ([#344](https://github.com/dbt-labs/dbt-spark/pull/344))
- Upgrade Spark version to 3.1.1 ([#348](https://github.com/dbt-labs/dbt-spark/issues/348), [#349](https://github.com/dbt-labs/dbt-spark/pull/349))

### Under the hood
- Use dbt.tests.adapter.basic in test suite ([#298](https://github.com/dbt-labs/dbt-spark/issues/298), [#299](https://github.com/dbt-labs/dbt-spark/pull/299))
Expand All @@ -13,6 +14,7 @@
### Contributors
- [@JCZuurmond](https://github.com/dbt-labs/dbt-spark/pull/279) ( [#279](https://github.com/dbt-labs/dbt-spark/pull/279))
- [@ueshin](https://github.com/ueshin) ([#320](https://github.com/dbt-labs/dbt-spark/pull/320))
- [@nssalian](https://github.com/nssalian) ([#349](https://github.com/dbt-labs/dbt-spark/pull/349))

## dbt-spark 1.1.0b1 (March 23, 2022)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ more information, consult [the docs](https://docs.getdbt.com/docs/profile-spark)

## Running locally
A `docker-compose` environment starts a Spark Thrift server and a Postgres database as a Hive Metastore backend.
Note that this is spark 2 not spark 3 so some functionalities might not be available.
Note: dbt-spark now supports Spark 3.1.1 (formerly on Spark 2.x).

The following command would start two docker containers
```
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
version: "3.7"
services:

dbt-spark2-thrift:
image: godatadriven/spark:3.0
dbt-spark3-thrift:
image: godatadriven/spark:3.1.1
ports:
- "10000:10000"
- "4040:4040"
Expand Down
4 changes: 3 additions & 1 deletion docker/spark-defaults.conf
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
spark.driver.memory 2g
spark.executor.memory 2g
spark.hadoop.datanucleus.autoCreateTables true
spark.hadoop.datanucleus.schema.autoCreateTables true
spark.hadoop.datanucleus.fixedDatastore false
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.jars.packages org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0
spark.jars.packages org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.driver.userClassPathFirst true
6 changes: 4 additions & 2 deletions tests/functional/adapter/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def project_config_update(self):
}


#hese tests were not enabled in the dbtspec files, so skipping here.
# These tests were not enabled in the dbtspec files, so skipping here.
# Error encountered was: Error running query: java.lang.ClassNotFoundException: delta.DefaultSource
@pytest.mark.skip_profile('apache_spark', 'spark_session')
class TestSnapshotTimestampSpark(BaseSnapshotTimestamp):
Expand All @@ -79,5 +79,7 @@ def project_config_update(self):
}
}


@pytest.mark.skip_profile('spark_session')
class TestBaseAdapterMethod(BaseAdapterMethod):
pass
pass