Skip to content

Commit

Permalink
[#1857] improvement: Add an Iceberg REST service demo (#2021)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Add an Iceberg REST service demo in the playground.

This pr is related to
apache/gravitino-playground#26

### Why are the changes needed?

Fix: #1857 #1699 

### Does this PR introduce _any_ user-facing change?
No, just doc.

### How was this patch tested?
By hand.

---------

Co-authored-by: Heng Qin <qqtt@123.com>
  • Loading branch information
qqqttt123 and Heng Qin authored Feb 4, 2024
1 parent 2f703db commit 801df01
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 6 deletions.
65 changes: 59 additions & 6 deletions docs/how-to-use-the-playground.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ You first need to install git and docker-compose.

The playground runs a number of services. The TCP ports used may clash with existing services you run, such as MySQL or Postgres.

| Docker container | Ports used |
| playground-gravitino | 8090 9001 |
| playground-hive | 3307 9000 9083 |
| playground-mysql | 3306 |
| playground-postgresql | 5342 |
| playground-trino | 8080 |
| Docker container | Ports used |
| playground-gravitino | 8090 9001 |
| playground-hive | 3307 9000 9083 |
| playground-mysql | 3306 |
| playground-postgresql | 5342 |
| playground-trino | 8080 |

## Start playground

Expand Down Expand Up @@ -121,3 +121,56 @@ FROM "metalake_demo.catalog_postgres".hr.employees AS e,
WHERE e.employee_id = p.employee_id AND p.employee_id = s.employee_id
GROUP BY e.employee_id, given_name, family_name;
```

### Using Iceberg REST service

If you want to migrate your business from Hive to Iceberg. Some tables will use Hive, and the other tables will use Iceberg.
Gravitino provides an Iceberg REST catalog service, too. You can will use Spark to access REST catalog to write the table data.
Then, you can use Trino to read the data from the Hive table joining the Iceberg table.

`spark-defaults.conf` is as follows (It's already configured in the playground):
```text
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.catalog_iceberg.type rest
spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/
spark.locality.wait.node 0
```
1. Login Spark container and execute the steps.
```shell
docker exec -it playground-spark bash
```
```shell
spark@7a495f27b92e:/$ cd /opt/spark && /bin/bash bin/spark-sql
```
```SQL
use catalog_iceberg;
create database sales;
use sales;
create table customers (customer_id int, customer_name varchar(100), customer_email varchar(100));
describe extended customers;
insert into customers (customer_id, customer_name, customer_email) values (11,'Rory Brown','rory@123.com');
insert into customers (customer_id, customer_name, customer_email) values (12,'Jerry Washington','jerry@dt.com');
```
2. Login Trino container and execute the steps.
You can get all the customers from both the Hive and Iceberg table.
```shell
docker exec -it playground-trino bash
```
```shell
trino@d2bbfccc7432:/$ trino
```
```SQL
select * from "metalake_demo.catalog_hive".sales.customers
union
select * from "metalake_demo.catalog_iceberg".sales.customers;
```
2 changes: 2 additions & 0 deletions docs/iceberg-rest-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,5 @@ DESCRIBE TABLE EXTENDED dml.test;
INSERT INTO dml.test VALUES (1), (2);
SELECT * FROM dml.test
```

You could try Spark with Gravitino REST catalog service in our [playground](./how-to-use-the-playground.md#using-iceberg-rest-service).

0 comments on commit 801df01

Please sign in to comment.