Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Added external workflow missing docs for usage/lineage #15223

Merged
merged 4 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions openmetadata-docs/content/partials/v1.3/connectors/yaml/lineage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
## Lineage

After running a Metadata Ingestion workflow, we can run Lineage workflow.
While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server.


### 1. Define the YAML Config

This is a sample config for BigQuery Lineage:

{% codePreview %}

{% codeInfoContainer %}

{% codeInfo srNumber=40 %}
#### Source Configuration - Source Config

You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json).

{% /codeInfo %}

{% codeInfo srNumber=41 %}

**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process lineage data in days.

{% /codeInfo %}

{% codeInfo srNumber=42 %}

**parsingTimeoutLimit**: Configuration to set the timeout for parsing the query in seconds.
{% /codeInfo %}

{% codeInfo srNumber=43 %}

**filterCondition**: Condition to filter the query history.

{% /codeInfo %}

{% codeInfo srNumber=44 %}

**resultLimit**: Configuration to set the limit for query logs.

{% /codeInfo %}

{% codeInfo srNumber=45 %}

**queryLogFilePath**: Configuration to set the file path for query logs.

{% /codeInfo %}

{% codeInfo srNumber=46 %}

**databaseFilterPattern**: Regex to only fetch databases that matches the pattern.

{% /codeInfo %}

{% codeInfo srNumber=47 %}

**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern.

{% /codeInfo %}

{% codeInfo srNumber=48 %}

**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern.

{% /codeInfo %}


{% codeInfo srNumber=49 %}

#### Sink Configuration

To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
{% /codeInfo %}


{% codeInfo srNumber=50 %}

#### Workflow Configuration

The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.

For a simple, local installation using our docker containers, this looks like:

{% /codeInfo %}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}


```yaml {% srNumber=40 %}
source:
type: {% $connector %}-lineage
serviceName: <serviceName (same as metadata ingestion service name)>
sourceConfig:
config:
type: DatabaseLineage
```

```yaml {% srNumber=41 %}
# Number of days to look back
queryLogDuration: 1
```
```yaml {% srNumber=42 %}
parsingTimeoutLimit: 300
```
```yaml {% srNumber=43 %}
# filterCondition: query_text not ilike '--- metabase query %'
```
```yaml {% srNumber=44 %}
resultLimit: 1000
```
```yaml {% srNumber=45 %}
# If instead of getting the query logs from the database we want to pass a file with the queries
# queryLogFilePath: /tmp/query_log/file_path
```
```yaml {% srNumber=46 %}
# databaseFilterPattern:
# includes:
# - database1
# - database2
# excludes:
# - database3
# - database4
```
```yaml {% srNumber=47 %}
# schemaFilterPattern:
# includes:
# - schema1
# - schema2
# excludes:
# - schema3
# - schema4
```
```yaml {% srNumber=48 %}
# tableFilterPattern:
# includes:
# - table1
# - table2
# excludes:
# - table3
# - table4
```

```yaml {% srNumber=49 %}
sink:
type: metadata-rest
config: {}
```

{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%}

{% /codeBlock %}

{% /codePreview %}

- You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from [here](/connectors/ingestion/workflows/lineage)

### 2. Run with the CLI

After saving the YAML config, we will run the command the same way we did for the metadata ingestion:

```bash
metadata ingest -c <path-to-yaml>
```
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configure and schedule BigQuery metadata and profiler workflows from the OpenMet
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -255,11 +255,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "bigquery"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "bigquery"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "bigquery"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ Configure and schedule Clickhouse metadata and profiler workflows from the OpenM
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -255,11 +255,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "clickhouse"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "clickhouse"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ Configure and schedule Databricks metadata and profiler workflows from the OpenM
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -193,11 +193,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "databricks"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "databricks"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "databricks"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "databricks"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configure and schedule MSSQL metadata and profiler workflows from the OpenMetada
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -229,11 +229,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "mssql"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mssql"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "mssql"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mssql"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configure and schedule Postgres metadata and profiler workflows from the OpenMet
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -283,11 +283,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "postgres"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "postgres"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "postgres"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "postgres"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configure and schedule Redshift metadata and profiler workflows from the OpenMet
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -204,12 +204,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "redshift"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "redshift"} /%}


## Lineage
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "redshift"} /%}

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "redshift"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Configure and schedule Snowflake metadata and profiler workflows from the OpenMe
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Query Usage](#query-usage)
- [Data Profiler](#data-profiler)
- [Lineage](#lineage)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
Expand Down Expand Up @@ -289,11 +289,9 @@ source:

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "snowflake"} /%}

{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "snowflake"} /%}
{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "snowflake"} /%}

## Lineage

You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage).
{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "snowflake"} /%}

## dbt Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Here you can enter the Lineage Ingestion details:

**Query Log Duration**

Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 48 hours prior to when the ingestion workflow is run.
Specify the duration in days for which the lineage should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data lineage will capture lineage information for 48 hours prior to when the ingestion workflow is run.

**Result Limit**

Expand All @@ -88,3 +88,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
caption="View Service Ingestion pipelines"
/%}

## YAML Configuration

In the [connectors](/connectors) section we showcase how to run the metadata ingestion from a JSON/YAML file using the Airflow SDK or the CLI via metadata ingest. Running a lineage workflow is also possible using a JSON/YAML configuration file.

This is a good option if you wish to execute your workflow via the Airflow SDK or using the CLI; if you use the CLI a lineage workflow can be triggered with the command `metadata ingest -c FILENAME.yaml`. The `serviceConnection` config will be specific to your connector (you can find more information in the [connectors](/connectors) section), though the sourceConfig for the lineage will be similar across all connectors.

{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%}
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Here you can enter the Usage Ingestion details:

**Query Log Duration**

Specify the duration in days for which the profiler should capture usage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture usage information for 48 hours prior to when the ingestion workflow is run.
Specify the duration in days for which the usage should capture usage data from the query logs. For example, if you specify 2 as the value for the duration, the data usage will capture usage information for 48 hours prior to when the ingestion workflow is run.

**Stage File Location**

Expand All @@ -93,4 +93,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
caption="View Service Ingestion pipelines"
/%}

## YAML Configuration

In the [connectors](/connectors) section we showcase how to run the metadata ingestion from a JSON/YAML file using the Airflow SDK or the CLI via metadata ingest. Running a usage workflow is also possible using a JSON/YAML configuration file.

This is a good option if you wish to execute your workflow via the Airflow SDK or using the CLI; if you use the CLI a usage workflow can be triggered with the command `metadata usage -c FILENAME.yaml`. The `serviceConnection` config will be specific to your connector (you can find more information in the [connectors](/connectors) section), though the sourceConfig for the usage will be similar across all connectors.

{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "bigquery"} /%}
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
caption="Schedule and Deploy the Lineage Ingestion"
/%}

## Run Lineage Workflow Externally

{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%}

## dbt Ingestion

We can also generate lineage through [dbt ingestion](/connectors/ingestion/workflows/dbt/ingest-dbt-ui). The dbt workflow can fetch queries that carry lineage information. For a dbt ingestion pipeline, the path to the Catalog and Manifest files must be specified. We also fetch the column level lineage through dbt.
Expand Down
Loading