Skip to content

Quick Start Guide

Harish Butani edited this page Jan 13, 2022 · 6 revisions

Prerequisites

  • You have a Spark Spark 3.1 environment.
  • Spark Oracle can be run on any type of cluster (embedded, spark-shell, thriftserver; standalone, yarn, kubernetes, mesos...)
  • You have access to an Oracle Database. If not, see [section below](#Setting up an Oracle Free Tier ADW instance).
  • Build the spark-oracle package.
  • The oracle user used for connection must have
    • plan_table access.
    • In some cases we use the DBMS_PARALLEL_EXECUTE procedure to plan splitting of pushdown queries.
  • For ADW instances

Setting up an Oracle Free Tier ADW instance

  • Create an Oracle Free Tier Account, if you don't already have one.
  • Instantiate an Oracle ADW instance.
  • Download and set up Oracle Instant Client and Oracle Instant Tools. You will use the sqlplus and impdp executables.
  • Download and set up Oracle wallet for the instance
    • Download and unzip the wallet See
    • Change the sqlnet.ora file to point to download folder and try sqlplus. See
    • For example: export TNS_ADMIN=.../wallet_oci_instance; ./sqlplus admin/<your_instance_admin_pass>@addressname
      • tnsnames.ora in the wallet folder has a list of address names.
  • Connect using sqlplus instant client and create a user for running the demo. For example:
create user tpcds identified by <your_chosen_password_here>;
grant CONNECT, RESOURCE, unlimited tablespace to tpcds;

-- To run unit tests, add the additional grants
grant drop any table to tpcds;
grant create any function to tpcds;
grant create any procedure to tpcds;
  • Upload your cloud credentials to the instance using dbms_cloud.create_credential procedure
    • See
    • For example
begin
    DBMS_CLOUD.CREATE_CREDENTIAL(
      credential_name => 'SPARK_ORA_CRED',
      username => '<your_oci_username>',
      password => 'your_oci_password'
    );
  end;
/
  • Load the tpcds tables and data into your instance.
    • A dump of tpcds scale1 is available in this oci location.
    • You can copy this to your own bucket.
  • Import the data by running :
# ensure TNS_ADMIN points to your wallet.
./impdp admin/<your_passwd>@<your_tns_address> directory=data_pump_dir credential=SPARK_ORA_CRED dumpfile=<your_dump_file_location>

Deploying Spark Oracle

The Spark Oracle zip file has the spark-oracle and oracle jdbc jars : spark oracle zip This zip file can be unzipped into any Spark deployment.

Configuring Spark Oracle

In order to enable the Spark Oracle catalog integration and pushdown functionality add the following to your spark configuration

# Oracle Catalog

# Enable Spark Oracle extensions
spark.sql.extensions=org.apache.spark.sql.oracle.SparkSessionExtensions
spark.kryo.registrator=org.apache.spark.sql.connector.catalog.oracle.OraKryoRegistrator

# enable the Oracle Catalog integration
spark.sql.catalog.oracle=org.apache.spark.sql.connector.catalog.oracle.OracleCatalog

# oracle sql logging and jdbc fetchsize
spark.sql.catalog.oracle.log_and_time_sql.enabled=true
spark.sql.catalog.oracle.log_and_time_sql.log_level=info
spark.sql.catalog.oracle.fetchSize=5000
spark.sql.oracle.enable.querysplitting=false
spark.sql.oracle.querysplit.target=1Mb

# Configure jdbc connection information

# Example for a non wallet instance
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<your_instance_details>
#spark.sql.catalog.oracle.user=tpcds
#spark.sql.catalog.oracle.password=tpcds

# Example config for an ADW instance
#spark.sql.catalog.oracle.authMethod=ORACLE_WALLET
#spark.sql.catalog.oracle.url=jdbc:oracle:thin:@<tns-address>
#spark.sql.catalog.oracle.user=<your_demo_user>
#spark.sql.catalog.oracle.password=<your_passwd>
#spark.sql.catalog.oracle.net.tns_admin=<wallet_location>

Demo

See the demo script for a session of catalog browsing and execution of tpcds queries.

Building Spark on Oracle

  • We use sbt
  • In the root folder of the project run sbt clean compile;sbt -DaddOraJars=true universal:packageBin. This creates spark-oracle-0.1.0-SNAPSHOT.zip in the packaging/spark_extend/target/universal folder.
Clone this wiki locally