Quick Start Guide

Prerequisites

You have a Spark Spark 3.1 environment.
Spark Oracle can be run on any type of cluster (embedded, spark-shell, thriftserver; standalone, yarn, kubernetes, mesos...)
You have access to an Oracle Database. If not, see [section below](#Setting up an Oracle Free Tier ADW instance).
Build the spark-oracle package.
The oracle user used for connection must have
- plan_table access.
- In some cases we use the DBMS_PARALLEL_EXECUTE procedure to plan splitting of pushdown queries.
For ADW instances
- Store Oracle wallet for the instance on each Spark node.
- To use external tables, ensure an OCI credential is set up in the database.

Setting up an Oracle Free Tier ADW instance

Create an Oracle Free Tier Account, if you don't already have one.
Instantiate an Oracle ADW instance.
Download and set up Oracle Instant Client and Oracle Instant Tools. You will use the sqlplus and impdp executables.
Download and set up Oracle wallet for the instance
- Download and unzip the wallet See
- Change the sqlnet.ora file to point to download folder and try sqlplus. See
- For example: export TNS_ADMIN=.../wallet_oci_instance; ./sqlplus admin/<your_instance_admin_pass>@addressname
  - tnsnames.ora in the wallet folder has a list of address names.
Connect using sqlplus instant client and create a user for running the demo. For example:

create user tpcds identified by <your_chosen_password_here>;
grant CONNECT, RESOURCE, unlimited tablespace to tpcds;

-- To run unit tests, add the additional grants
grant drop any table to tpcds;
grant create any function to tpcds;
grant create any procedure to tpcds;

Upload your cloud credentials to the instance using dbms_cloud.create_credential procedure
- See
- For example

begin
    DBMS_CLOUD.CREATE_CREDENTIAL(
      credential_name => 'SPARK_ORA_CRED',
      username => '<your_oci_username>',
      password => 'your_oci_password'
    );
  end;
/

Load the tpcds tables and data into your instance.
- A dump of tpcds scale1 is available in this oci location.
- You can copy this to your own bucket.
Import the data by running :

# ensure TNS_ADMIN points to your wallet.
./impdp admin/<your_passwd>@<your_tns_address> directory=data_pump_dir credential=SPARK_ORA_CRED dumpfile=<your_dump_file_location>

Deploying Spark Oracle

The Spark Oracle zip file has the spark-oracle and oracle jdbc jars : This zip file can be unzipped into any Spark deployment.

Configuring Spark Oracle

In order to enable the Spark Oracle catalog integration and pushdown functionality add the following to your spark configuration

# Oracle Catalog

# Enable Spark Oracle extensions
spark.sql.extensions=org.apache.spark.sql.oracle.SparkSessionExtensions
spark.kryo.registrator=org.apache.spark.sql.connector.catalog.oracle.OraKryoRegistrator

# enable the Oracle Catalog integration
spark.sql.catalog.oracle=org.apache.spark.sql.connector.catalog.oracle.OracleCatalog

# oracle sql logging and jdbc fetchsize
spark.sql.catalog.oracle.log_and_time_sql.enabled=true
spark.sql.catalog.oracle.log_and_time_sql.log_level=info
spark.sql.catalog.oracle.fetchSize=5000
spark.sql.oracle.enable.querysplitting=false
spark.sql.oracle.querysplit.target=1Mb

# Configure jdbc connection information

# Example for a non wallet instance
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<your_instance_details>
#spark.sql.catalog.oracle.user=tpcds
#spark.sql.catalog.oracle.password=tpcds

# Example config for an ADW instance
#spark.sql.catalog.oracle.authMethod=ORACLE_WALLET
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<tns-address>
#spark.sql.catalog.oracle.user=<your_demo_user>
#spark.sql.catalog.oracle.password=<your_passwd>
#spark.sql.catalog.oracle.net.tns_admin=<wallet_location>

Demo

See the demo script for a session of catalog browsing and execution of tpcds queries.

Building Spark on Oracle

We use sbt
In the root folder of the project run sbt clean compile;sbt -DaddOraJars=true universal:packageBin. This creates spark-oracle-0.1.0-SNAPSHOT.zip in the packaging/spark_extend/target/universal folder.

Quick Start
Latest Demo
Configuration
Catalog
Translation
Query Splitting details
DML Operations
Language Integration
- Spark SQL Macros
- Spark SQL Macro examples
Dockerized Demo env.
Sharded Database
- TPCH queries
Developer Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quick Start Guide

Prerequisites

Setting up an Oracle Free Tier ADW instance

Deploying Spark Oracle

Configuring Spark Oracle

Demo

Building Spark on Oracle

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally