diff --git a/docs/how-to-use-the-playground.md b/docs/how-to-use-the-playground.md index d1bc6198eff..018b83abf7d 100644 --- a/docs/how-to-use-the-playground.md +++ b/docs/how-to-use-the-playground.md @@ -20,12 +20,12 @@ You first need to install git and docker-compose. The playground runs a number of services. The TCP ports used may clash with existing services you run, such as MySQL or Postgres. -| Docker container | Ports used | -| playground-gravitino | 8090 9001 | -| playground-hive | 3307 9000 9083 | -| playground-mysql | 3306 | -| playground-postgresql | 5342 | -| playground-trino | 8080 | + | Docker container | Ports used | + | playground-gravitino | 8090 9001 | + | playground-hive | 3307 9000 9083 | + | playground-mysql | 3306 | + | playground-postgresql | 5342 | + | playground-trino | 8080 | ## Start playground @@ -121,3 +121,56 @@ FROM "metalake_demo.catalog_postgres".hr.employees AS e, WHERE e.employee_id = p.employee_id AND p.employee_id = s.employee_id GROUP BY e.employee_id, given_name, family_name; ``` + +### Using Iceberg REST service + +If you want to migrate your business from Hive to Iceberg. Some tables will use Hive, and the other tables will use Iceberg. +Gravitino provides an Iceberg REST catalog service, too. You can will use Spark to access REST catalog to write the table data. +Then, you can use Trino to read the data from the Hive table joining the Iceberg table. + +`spark-defaults.conf` is as follows (It's already configured in the playground): + +```text +spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions +spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog +spark.sql.catalog.catalog_iceberg.type rest +spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/ +spark.locality.wait.node 0 +``` + +1. Login Spark container and execute the steps. + +```shell +docker exec -it playground-spark bash +``` + +```shell +spark@7a495f27b92e:/$ cd /opt/spark && /bin/bash bin/spark-sql +``` + +```SQL +use catalog_iceberg; +create database sales; +use sales; +create table customers (customer_id int, customer_name varchar(100), customer_email varchar(100)); +describe extended customers; +insert into customers (customer_id, customer_name, customer_email) values (11,'Rory Brown','rory@123.com'); +insert into customers (customer_id, customer_name, customer_email) values (12,'Jerry Washington','jerry@dt.com'); +``` + +2. Login Trino container and execute the steps. +You can get all the customers from both the Hive and Iceberg table. + +```shell +docker exec -it playground-trino bash +``` + +```shell +trino@d2bbfccc7432:/$ trino +``` + +```SQL +select * from "metalake_demo.catalog_hive".sales.customers +union +select * from "metalake_demo.catalog_iceberg".sales.customers; +``` diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md index ee98c475546..95cb83c95f6 100644 --- a/docs/iceberg-rest-service.md +++ b/docs/iceberg-rest-service.md @@ -170,3 +170,5 @@ DESCRIBE TABLE EXTENDED dml.test; INSERT INTO dml.test VALUES (1), (2); SELECT * FROM dml.test ``` + +You could try Spark with Gravitino REST catalog service in our [playground](./how-to-use-the-playground.md#using-iceberg-rest-service).