[#1857] improvement: Add an Iceberg REST service demo (#2021)

### What changes were proposed in this pull request? Add an Iceberg REST service demo in the playground. This pr is related to apache/gravitino-playground#26 ### Why are the changes needed? Fix: #1857 #1699 ### Does this PR introduce _any_ user-facing change? No, just doc. ### How was this patch tested? By hand. --------- Co-authored-by: Heng Qin <qqtt@123.com>
apache · Feb 4, 2024 · 801df01 · 801df01
1 parent 2f703db
commit 801df01
Show file tree

Hide file tree

Showing 2 changed files with 61 additions and 6 deletions.
diff --git a/docs/how-to-use-the-playground.md b/docs/how-to-use-the-playground.md
@@ -20,12 +20,12 @@ You first need to install git and docker-compose.
 
 The playground runs a number of services. The TCP ports used may clash with existing services you run, such as MySQL or Postgres.
 
-| Docker container      | Ports used     |
-| playground-gravitino  | 8090 9001      |
-| playground-hive       | 3307 9000 9083 |
-| playground-mysql      | 3306           |
-| playground-postgresql | 5342           |
-| playground-trino      | 8080           |
+    | Docker container      | Ports used     |
+    | playground-gravitino  | 8090 9001      |
+    | playground-hive       | 3307 9000 9083 |
+    | playground-mysql      | 3306           |
+    | playground-postgresql | 5342           |
+    | playground-trino      | 8080           |
 
 ## Start playground
 
@@ -121,3 +121,56 @@ FROM "metalake_demo.catalog_postgres".hr.employees AS e,
 WHERE e.employee_id = p.employee_id AND p.employee_id = s.employee_id
 GROUP BY e.employee_id,  given_name, family_name;
 ```
+
+### Using Iceberg REST service
+
+If you want to migrate your business from Hive to Iceberg. Some tables will use Hive, and the other tables will use Iceberg.
+Gravitino provides an Iceberg REST catalog service, too. You can will use Spark to access REST catalog to write the table data.
+Then, you can use Trino to read the data from the Hive table joining the Iceberg table.
+
+`spark-defaults.conf` is as follows (It's already configured in the playground):
+
+```text
+spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
+spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog
+spark.sql.catalog.catalog_iceberg.type rest
+spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/
+spark.locality.wait.node 0
+```
+
+1. Login Spark container and execute the steps.
+
+```shell
+docker exec -it playground-spark bash
+```
+
+```shell
+spark@7a495f27b92e:/$ cd /opt/spark && /bin/bash bin/spark-sql 
+```
+
+```SQL
+use catalog_iceberg;
+create database sales;
+use sales;
+create table customers (customer_id int, customer_name varchar(100), customer_email varchar(100));
+describe extended customers;    
+insert into customers (customer_id, customer_name, customer_email) values (11,'Rory Brown','rory@123.com');
+insert into customers (customer_id, customer_name, customer_email) values (12,'Jerry Washington','jerry@dt.com');
+```
+
+2. Login Trino container and execute the steps.
+You can get all the customers from both the Hive and Iceberg table.
+
+```shell
+docker exec -it playground-trino bash
+```
+
+```shell
+trino@d2bbfccc7432:/$ trino  
+```
+
+```SQL
+select * from "metalake_demo.catalog_hive".sales.customers
+union
+select * from "metalake_demo.catalog_iceberg".sales.customers;
+```
diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md
@@ -170,3 +170,5 @@ DESCRIBE TABLE EXTENDED dml.test;
 INSERT INTO dml.test VALUES (1), (2);
 SELECT * FROM dml.test
 ```
+
+You could try Spark with Gravitino REST catalog service in our [playground](./how-to-use-the-playground.md#using-iceberg-rest-service).