Installation and deployment instructions (using Postgres as example)

Below are the instructions for connecting a Postgress server. The installation steps should be the same for connecting all kinds of servers. Different servers would require different configurations in the .yaml or DAG files. See https://docs.open-metadata.org/integrations/connectors for your configuration.

Goal: To run Postgres metadata ingestion and quality tests with OpenMetadata using Airflow scheduler

Note: This procedure does not support Windows, because Windows does not implement "signal.SIGALRM". It is highly recommended to use WSL 2 if you are on Windows.

Requirements:

See https://docs.open-metadata.org/overview/run-openmetadata-with-prefect "Requirements" section

Installation:

Clone this git hub repo: git clone https://github.com/open-metadata/OpenMetadata.git
Cd to ~/.../openmetadata/docker/metadata
Start the OpenMetadata containers. This will allow you run OpenMetadata in Docker: docker compose up -d

To check the status of services, run docker compose ps
To access the UI: http://localhost:8585

Install the OpenMetadata ingestion package.

(optional but highly recommended): Before installing this package, it is recommended to create and activate a virtual environment. To do this, run: python -m venv env and source env/bin/activate
To install the OpenMetadata ingestion package: pip install --upgrade "openmetadata-ingestion[docker]==0.10.3" (specify the release version to ensure compatibility)

Install Airflow:

5A: Install Airflow Lineage Backend: pip3 install "openmetadata-ingestion[airflow-container]"==0.10.3
5B: Install Airflow postgres connector module: pip3 install "openmetadata-ingestion[postgres]"==0.10.3
5C: Install Airflow APIs: pip3 install "openmetadata-airflow-managed-apis"==0.10.3
5D: Install necessary Airflow plugins:
- 1. Download the latest openmetadata-airflow-apis-plugins release from https://github.com/open-metadata/OpenMetadata/releases
- 1. Untar it under your {AIRFLOW_HOME} directory (usually c/Users/Yourname/airflow). This will create and setup a plugins directory under {AIRFLOW_HOME} .
- 1. cp -r {AIRFLOW_HOME}/plugins/dag_templates {AIRFLOW_HOME}
- 1. mkdir -p {AIRFLOW_HOME}/dag_generated_configs
- 1. (re)start the airflow webserver and scheduler

Configure Airflow:

6A: configure airflow.cfg in your AIRFLOW_HOME directory. Check and make all the folder directories point to the right places. For instance, dags_folder = YOUR_AIRFLOW_HOME/dags
6B: configure openmetadata.yaml and update the airflowConfiguration section. See: https://docs.open-metadata.org/integrations/airflow/configure-airflow-in-the-openmetadata-server

To run a metadata ingestion workflow with Airflow ingestion DAGs on Postgres data:

Prepare the Ingestion DAG: To see a more complete tutorial on ingestion DAG, see https://docs.open-metadata.org/integrations/connectors/postgres/run-postgres-connector-with-the-airflow-sdk To be brief, below is my own DAG. Copy & Paste the following into a python file (postgres_demo.py):

import pathlib
import json
from datetime import timedelta
from airflow import DAG

try:
    from airflow.operators.python import PythonOperator
except ModuleNotFoundError:
    from airflow.operators.python_operator import PythonOperator

from metadata.config.common import load_config_file
from metadata.ingestion.api.workflow import Workflow
from airflow.utils.dates import days_ago

default_args = {
    "owner": "user_name",
    "email": ["username@org.com"],
    "email_on_failure": False,
    "retries": 3,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(minutes=60)
}

config = """
{
    "source":{
        "type": "postgres",
        "serviceName": "postgres_demo",
        "serviceConnection": {
            "config": {
                "type": "Postgres",
                "username": "postgres", (change to your username)
                "password": "postgres", (change to your password)
                "hostPort": "192.168.1.55:5432", (change to your hostPort)
                "database": "surveillance_hub" (change to your database)
            }
        },
        "sourceConfig":{
            "config": { (all of the following can switch to true or false)
                "enableDataProfiler": "true" or "false", 
                "markDeletedTables": "true" or "false",
                "includeTables": "true" or "false",
                "includeViews": "true" or "false",
                "generateSampleData": "true" or "false" 
            }
        }
    },      
    "sink":{
        "type": "metadata-rest",
        "config": {}
    },   
    "workflowConfig": {
        "openMetadataServerConfig": {
            "hostPort": "http://localhost:8585/api",
            "authProvider": "no-auth"
        }
    }
        
        
}
"""

def metadata_ingestion_workflow():
    workflow_config = json.loads(config)
    workflow = Workflow.create(workflow_config)
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()


with DAG(
    "sample_data",
    default_args=default_args,
    description="An example DAG which runs a OpenMetadata ingestion workflow",
    start_date=days_ago(1),
    is_paused_upon_creation=False,
    schedule_interval='*/5 * * * *', 
    catchup=False,
) as dag:
    ingest_task = PythonOperator(
        task_id="ingest_using_recipe",
        python_callable=metadata_ingestion_workflow,
    )

if __name__ == "__main__":
    metadata_ingestion_workflow()

Run the DAG: python postgres_demo.py

Alternatively, we could run without Airflow SDK and with OpenMetadata's own methods. Run metadata ingest -c /Your_Path_To_Json/.json The json configuration is exactly the same as the json configuration in the DAG.
Or, we could also run it with metadata ingest -c /Your_Path_To_Yaml/.yaml The yaml configuration would be the exact same except without the curly brackets and the double quotes.

To run a profiler workflow on Postgres data

Prepare the DAG OR configure the yaml/json:

To configure the quality tests in json/yaml, see https://docs.open-metadata.org/data-quality/data-quality-overview/tests
To prepare the DAG, see https://github.com/open-metadata/OpenMetadata/tree/0.10.3-release/data-quality/data-quality-overview

Example yaml I was using:

source:
  type: postgres
  serviceName: your_service_name
  serviceConnection:
    config:
      type: Postgres
      username: your_username
      password: your_password
      hostPort: 
      database: your_database  
  sourceConfig:
    config:
      type: Profiler

processor:
  type: orm-profiler
  config:
    test_suite:
      name: demo_test
      tests:
        - table: your_table_name (FQN)
          column_tests:
            - columnName: id
              testCase:
                columnTestType: columnValuesToBeBetween
                config:
                  minValue: 0
                  maxValue: 10
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: no-auth

Note that the table name must be FQN and match exactly with the table path on the OpenMetadata UI.

Run it with metadata profile -c /path_to_yaml/.yaml

Make sure to refresh the OpenMetadata UI and click on the Data Quality tab to see the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

installation_deployment_postgres_demo.md

installation_deployment_postgres_demo.md

Installation and deployment instructions (using Postgres as example)

Goal: To run Postgres metadata ingestion and quality tests with OpenMetadata using Airflow scheduler

Requirements:

Installation:

To run a metadata ingestion workflow with Airflow ingestion DAGs on Postgres data:

To run a profiler workflow on Postgres data

Files

installation_deployment_postgres_demo.md

Latest commit

History

installation_deployment_postgres_demo.md

File metadata and controls

Installation and deployment instructions (using Postgres as example)

Goal: To run Postgres metadata ingestion and quality tests with OpenMetadata using Airflow scheduler

Requirements:

Installation:

To run a metadata ingestion workflow with Airflow ingestion DAGs on Postgres data:

To run a profiler workflow on Postgres data