On an Ubuntu Linux machine install both docker
and docker-compose
.
Install various utilities
sudo apt install maven
sudo apt install curl
sudo apt install jq
sudo apt install httpie
sudo apt install unzip
Install datahub
cli
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
First clone the GitHub Project
git clone http://github.com/trivadispf/data-mesh-hackathon
Navigate to the project folder
cd data-mesh-hackathon
Set the DATAPLATFORM_HOME
environment variable to the infra/platys
folder.
export DATAPLATFORM_HOME=${PWD}/infra/platys
Run the provision script to copy the data into the platys platform folder
./provision.sh
Start the platys platform
export PUBLIC_IP=xxxx
export DOCKER_HOST_IP=xxx
cd ./infra/platys
docker-compose up -d
wait for the platform to start
...
Creating hive-metastore-db ... done
Creating zeppelin ... done
Creating kafka-connect-1 ... done
Creating akhq ... done
Creating file-browser ... done
Creating kafka-1 ... done
Creating kafka-3 ... done
Creating kafka-2 ... done
Creating ksqldb-cli ... done
Creating spark-worker-1 ... done
Creating spark-worker-2 ... done
docker@ubuntu ~/d/i/platys (main)>
Once the platform is started successfully, we can deploy the various artefacts.
The Kafka topics have been automatically created by jikkou
.
To deploy the Avro schemas to the schema registry, perform
export DATAPLATFORM_IP=${DOCKER_HOST_IP}
cd $DATAPLATFORM_HOME/../../implementation/global/java/ecommerce-meta/
mvn clean install schema-registry:register
cd $DATAPLATFORM_HOME/../../implementation/salesorder/java/ecommerce-salesorder-meta/
mvn clean install schema-registry:register
Now let's add the StreamSets pipelines. First login to StreamSets and link the instance to a user. After that you can deploy the StreamSets pipelines, perform
cd $DATAPLATFORM_HOME/../../implementation/scripts
./import-streamsets-pipelines.sh ${DATAPLATFORM_HOME}/../..
Finish the import script by hitting <ENTER>
To deploy the kafka connect connector instances
cd $DATAPLATFORM_HOME/../../implementation/scripts
./create-kafka-connect.sh ${DATAPLATFORM_HOME}/../..
Upload Avro Schemas to S3 for Hive tables
cd $DATAPLATFORM_HOME/../../implementation/scripts
./upload-avro-to-s3.sh ${DATAPLATFORM_HOME}/../..
To deploy the Hive tables for Trino
cd $DATAPLATFORM_HOME/../../implementation/scripts
./create-hive-tables.sh ${DATAPLATFORM_HOME}/../..
To deploy the Trino objects
cd $DATAPLATFORM_HOME/../../implementation/scripts
./create-trino-objects.sh ${DATAPLATFORM_HOME}/../..
To deploy the ksqlDB objects
cd $DATAPLATFORM_HOME/../../implementation/scripts
./create-ksql-objects.sh ${DATAPLATFORM_HOME}/../..
To deploy the DataHub (Catalog) domains
export DATAPLATFORM_IP=${DOCKER_HOST_IP}
export DATAHUB_GMS_URL=http://$DATAPLATFORM_IP:28142
cd $DATAPLATFORM_HOME/../../implementation/datahub/domain/
./create-domains.sh
cd $DATAPLATFORM_HOME/../../implementation/datahub/glossary/
datahub ingest deploy -n IngestBusinessGlossary -c business_glossary_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/glossary/
datahub ingest deploy -n IngestPIIGlossary -c pii_glossary_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/kafka/
datahub ingest deploy -n IngestKafka -c kafka_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/kafka/
datahub ingest deploy -n IngestKafkaLineage -c kafka-lineage_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/kafka/
datahub ingest deploy -n IngestKafkaConnect -c kafka-connect_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/s3/
datahub ingest deploy -n IngestS3 -c s3_recipe.yml
cd $DATAPLATFORM_HOME/../../implementation/datahub/data-product/
datahub dataproduct upsert -f customer-dp.yml
datahub dataproduct upsert -f order-dp.yml
Navigate to Streamsets http://dataplatform:18630 and first start the init pipelines (you may filter by label "init")
ref_init
customer_init
salesorder_init
These run only a few seconds and load the initial data.
Now start the Pipelines (you may filter by label "dataflow")
customer_customerstate-to-kafka
customer_customeraddresschanged-to-kafka
customer_replicate-country-from-ref
These will remain running all the time
Now you can run the simulator pipelines (you may filter by label "simulator")
product_simulate-product
customer_simulate-person-and-address
salesorder_simulate-order-online
"salesorder_simulate-order-online" will run all the time. The others will terminate after a while.
- JDBC URL:
jdbc:oracle:thin:@oracledb-xe:1521/XEPDB1
- Class Name:
oracle.jdbc.driver.OracleDriver
- Username:
ECOMM_CUSTOMER_PUB_V1
- Oracle Driver Download Link: ojdbc8-23.3.0.23.09.jar