diff --git a/openmetadata-docs/content/partials/v1.3/connectors/yaml/data-quality.md b/openmetadata-docs/content/partials/v1.3/connectors/yaml/data-quality.md new file mode 100644 index 000000000000..d04d64130ccd --- /dev/null +++ b/openmetadata-docs/content/partials/v1.3/connectors/yaml/data-quality.md @@ -0,0 +1,115 @@ +## Data Quality + +### Adding Data Quality Test Cases from yaml config + +When creating a JSON config for a test workflow the source configuration is very simple. +```yaml +source: + type: TestSuite + serviceName: + sourceConfig: + config: + type: TestSuite + entityFullyQualifiedName: +``` +The only sections you need to modify here are the `serviceName` (this name needs to be unique) and `entityFullyQualifiedName` (the entity for which we'll be executing tests against) keys. + +Once you have defined your source configuration you'll need to define te processor configuration. + +```yaml +processor: + type: "orm-test-runner" + config: + forceUpdate: + testCases: + - name: + testDefinitionName: columnValueLengthsToBeBetween + columnName: + parameterValues: + - name: minLength + value: 10 + - name: maxLength + value: 25 + - name: + testDefinitionName: tableRowCountToEqual + parameterValues: + - name: value + value: 10 +``` + +The processor type should be set to ` "orm-test-runner"`. For accepted test definition names and parameter value names refer to the [tests page](/connectors/ingestion/workflows/data-quality/tests). + +{% note %} + +Note that while you can define tests directly in this YAML configuration, running the +workflow will execute ALL THE TESTS present in the table, regardless of what you are defining in the YAML. + +This makes it easy for any user to contribute tests via the UI, while maintaining the test execution external. + +{% /note %} + +You can keep your YAML config as simple as follows if the table already has tests. + +```yaml +processor: + type: "orm-test-runner" + config: {} +``` + +### Key reference: + +- `forceUpdate`: if the test case exists (base on the test case name) for the entity, implements the strategy to follow when running the test (i.e. whether or not to update parameters) +- `testCases`: list of test cases to add to the entity referenced. Note that we will execute all the tests present in the Table. +- `name`: test case name +- `testDefinitionName`: test definition +- `columnName`: only applies to column test. The name of the column to run the test against +- `parameterValues`: parameter values of the test + + +The `sink` and `workflowConfig` will have the same settings as the ingestion and profiler workflow. + +### Full `yaml` config example + +```yaml +source: + type: TestSuite + serviceName: MyAwesomeTestSuite + sourceConfig: + config: + type: TestSuite + entityFullyQualifiedName: MySQL.default.openmetadata_db.tag_usage + +processor: + type: "orm-test-runner" + config: + forceUpdate: false + testCases: + - name: column_value_length_tagFQN + testDefinitionName: columnValueLengthsToBeBetween + columnName: tagFQN + parameterValues: + - name: minLength + value: 10 + - name: maxLength + value: 25 + - name: table_row_count_test + testDefinitionName: tableRowCountToEqual + parameterValues: + - name: value + value: 10 + +sink: + type: metadata-rest + config: {} +workflowConfig: + openMetadataServerConfig: + hostPort: + authProvider: +``` + +### How to Run Tests + +To run the tests from the CLI execute the following command +``` +metadata test -c /path/to/my/config.yaml +``` diff --git a/openmetadata-docs/content/partials/v1.3/connectors/yaml/lineage.md b/openmetadata-docs/content/partials/v1.3/connectors/yaml/lineage.md new file mode 100644 index 000000000000..6330a3670e96 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.3/connectors/yaml/lineage.md @@ -0,0 +1,167 @@ +## Lineage + +After running a Metadata Ingestion workflow, we can run Lineage workflow. +While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. + + +### 1. Define the YAML Config + +This is a sample config for BigQuery Lineage: + +{% codePreview %} + +{% codeInfoContainer %} + +{% codeInfo srNumber=40 %} +#### Source Configuration - Source Config + +You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json). + +{% /codeInfo %} + +{% codeInfo srNumber=41 %} + +**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process lineage data in days. + +{% /codeInfo %} + +{% codeInfo srNumber=42 %} + +**parsingTimeoutLimit**: Configuration to set the timeout for parsing the query in seconds. +{% /codeInfo %} + +{% codeInfo srNumber=43 %} + +**filterCondition**: Condition to filter the query history. + +{% /codeInfo %} + +{% codeInfo srNumber=44 %} + +**resultLimit**: Configuration to set the limit for query logs. + +{% /codeInfo %} + +{% codeInfo srNumber=45 %} + +**queryLogFilePath**: Configuration to set the file path for query logs. + +{% /codeInfo %} + +{% codeInfo srNumber=46 %} + +**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. + +{% /codeInfo %} + +{% codeInfo srNumber=47 %} + +**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. + +{% /codeInfo %} + +{% codeInfo srNumber=48 %} + +**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. + +{% /codeInfo %} + + +{% codeInfo srNumber=49 %} + +#### Sink Configuration + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. +{% /codeInfo %} + + +{% codeInfo srNumber=50 %} + +#### Workflow Configuration + +The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. + +For a simple, local installation using our docker containers, this looks like: + +{% /codeInfo %} + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.yaml" %} + + +```yaml {% srNumber=40 %} +source: + type: {% $connector %}-lineage + serviceName: + sourceConfig: + config: + type: DatabaseLineage +``` + +```yaml {% srNumber=41 %} + # Number of days to look back + queryLogDuration: 1 +``` +```yaml {% srNumber=42 %} + parsingTimeoutLimit: 300 +``` +```yaml {% srNumber=43 %} + # filterCondition: query_text not ilike '--- metabase query %' +``` +```yaml {% srNumber=44 %} + resultLimit: 1000 +``` +```yaml {% srNumber=45 %} + # If instead of getting the query logs from the database we want to pass a file with the queries + # queryLogFilePath: /tmp/query_log/file_path +``` +```yaml {% srNumber=46 %} + # databaseFilterPattern: + # includes: + # - database1 + # - database2 + # excludes: + # - database3 + # - database4 +``` +```yaml {% srNumber=47 %} + # schemaFilterPattern: + # includes: + # - schema1 + # - schema2 + # excludes: + # - schema3 + # - schema4 +``` +```yaml {% srNumber=48 %} + # tableFilterPattern: + # includes: + # - table1 + # - table2 + # excludes: + # - table3 + # - table4 +``` + +```yaml {% srNumber=49 %} +sink: + type: metadata-rest + config: {} +``` + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} + +{% /codeBlock %} + +{% /codePreview %} + +- You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from [here](/connectors/ingestion/workflows/lineage) + +### 2. Run with the CLI + +After saving the YAML config, we will run the command the same way we did for the metadata ingestion: + +```bash +metadata ingest -c +``` \ No newline at end of file diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/athena/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/athena/yaml.md index 319bc08978f5..9940b5d41580 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/athena/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/athena/yaml.md @@ -34,8 +34,9 @@ Configure and schedule Athena metadata and profiler workflows from the OpenMetad - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -359,11 +360,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "athena"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "athena"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "athena"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "athena"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/azuresql/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/azuresql/yaml.md index 4be33d0e8657..0c0e08976d7c 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/azuresql/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/azuresql/yaml.md @@ -35,6 +35,7 @@ Configure and schedule AzureSQL metadata and profiler workflows from the OpenMet - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -192,6 +193,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "azuresql"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/bigquery/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/bigquery/yaml.md index f2be194ed032..e638d32fc02c 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/bigquery/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/bigquery/yaml.md @@ -35,8 +35,9 @@ Configure and schedule BigQuery metadata and profiler workflows from the OpenMet - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -255,11 +256,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "bigquery"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "bigquery"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "bigquery"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/clickhouse/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/clickhouse/yaml.md index 1e909c070fe6..69b0e39677e8 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/clickhouse/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/clickhouse/yaml.md @@ -34,8 +34,9 @@ Configure and schedule Clickhouse metadata and profiler workflows from the OpenM - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -255,11 +256,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "clickhouse"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "clickhouse"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/databricks/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/databricks/yaml.md index 3960a74f8c21..3aa646e5cde9 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/databricks/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/databricks/yaml.md @@ -34,8 +34,9 @@ Configure and schedule Databricks metadata and profiler workflows from the OpenM - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -193,11 +194,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "databricks"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "databricks"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "databricks"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "databricks"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/db2/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/db2/yaml.md index defa9a5bbace..2c89b6821fe8 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/db2/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/db2/yaml.md @@ -35,6 +35,7 @@ Configure and schedule DB2 metadata and profiler workflows from the OpenMetadata - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -198,6 +199,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "db2"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/doris/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/doris/yaml.md index fd03eafbfc22..0b50fbbe3193 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/doris/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/doris/yaml.md @@ -35,6 +35,7 @@ Configure and schedule Doris metadata and profiler workflows from the OpenMetada - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.2/connectors/external-ingestion-deployment.md" /%} @@ -174,6 +175,8 @@ source: {% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "doris"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## Lineage You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/druid/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/druid/yaml.md index 520434ef928a..cbbb9542dcc1 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/druid/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/druid/yaml.md @@ -35,6 +35,7 @@ Configure and schedule Druid metadata and profiler workflows from the OpenMetada - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -173,6 +174,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "druid"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/greenplum/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/greenplum/yaml.md index 0ffc346973fd..1e91110d02eb 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/greenplum/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/greenplum/yaml.md @@ -37,6 +37,7 @@ Configure and schedule Greenplum metadata and profiler workflows from the OpenMe - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [Lineage](#lineage) - [dbt Integration](#dbt-integration) @@ -261,6 +262,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "greenplum"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## Lineage You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/hive/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/hive/yaml.md index 6817cf60e62a..52bb1cb65e50 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/hive/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/hive/yaml.md @@ -34,6 +34,7 @@ Configure and schedule Hive metadata and profiler workflows from the OpenMetadat - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -219,6 +220,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "hive"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/impala/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/impala/yaml.md index 3483f131f105..f81bebaf9f16 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/impala/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/impala/yaml.md @@ -33,6 +33,7 @@ Configure and schedule Impala metadata and profiler workflows from the OpenMetad - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -180,6 +181,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "impala"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/mariadb/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/mariadb/yaml.md index ebab3da3ed77..7b64a14fa422 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/mariadb/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/mariadb/yaml.md @@ -33,6 +33,7 @@ Configure and schedule MariaDB metadata and profiler workflows from the OpenMeta - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -176,6 +177,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mariadb"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/mssql/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/mssql/yaml.md index d9ac84c04e79..eb5152a2d655 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/mssql/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/mssql/yaml.md @@ -35,8 +35,9 @@ Configure and schedule MSSQL metadata and profiler workflows from the OpenMetada - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -229,11 +230,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "mssql"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mssql"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "mssql"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mssql"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/mysql/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/mysql/yaml.md index 6ab3a39704af..81b3a6c41c77 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/mysql/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/mysql/yaml.md @@ -35,6 +35,7 @@ Configure and schedule MySQL metadata and profiler workflows from the OpenMetada - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -252,6 +253,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "mysql"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/oracle/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/oracle/yaml.md index 75205d32d957..27c21f21877a 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/oracle/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/oracle/yaml.md @@ -36,6 +36,7 @@ Configure and schedule Oracle metadata and profiler workflows from the OpenMetad - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [Lineage](#lineage) - [dbt Integration](#dbt-integration) @@ -219,6 +220,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "oracle"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## Lineage You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/pinotdb/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/pinotdb/yaml.md index b51bd05c7b08..ab84925b09e1 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/pinotdb/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/pinotdb/yaml.md @@ -34,6 +34,7 @@ Configure and schedule PinotDB metadata and profiler workflows from the OpenMeta - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -170,6 +171,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "pinotdb"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/postgres/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/postgres/yaml.md index 20a42793e25c..c659e847081f 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/postgres/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/postgres/yaml.md @@ -35,8 +35,9 @@ Configure and schedule Postgres metadata and profiler workflows from the OpenMet - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -283,11 +284,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "postgres"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "postgres"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "postgres"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "postgres"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/presto/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/presto/yaml.md index a9b7446d1559..6296460cfb73 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/presto/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/presto/yaml.md @@ -34,6 +34,7 @@ Configure and schedule Presto metadata and profiler workflows from the OpenMetad - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -179,6 +180,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "presto"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/redshift/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/redshift/yaml.md index 4782ccea9498..f5e04afe58e8 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/redshift/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/redshift/yaml.md @@ -35,8 +35,9 @@ Configure and schedule Redshift metadata and profiler workflows from the OpenMet - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -204,12 +205,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "redshift"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "redshift"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "redshift"} /%} +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "redshift"} /%} -## Lineage - -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/sap-hana/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/sap-hana/yaml.md index bc66c7527869..6d3f5a77fb35 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/sap-hana/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/sap-hana/yaml.md @@ -34,6 +34,7 @@ Configure and schedule SAP Hana metadata and profiler workflows from the OpenMet - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -208,6 +209,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "sapHana"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/singlestore/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/singlestore/yaml.md index 651a3df683de..4f244cd8ae9c 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/singlestore/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/singlestore/yaml.md @@ -34,6 +34,7 @@ Configure and schedule Singlestore metadata and profiler workflows from the Open - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -171,6 +172,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "singlestore"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/snowflake/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/snowflake/yaml.md index 6e46810e5052..8edf4e8074ca 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/snowflake/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/snowflake/yaml.md @@ -35,8 +35,9 @@ Configure and schedule Snowflake metadata and profiler workflows from the OpenMe - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) -- [Data Profiler](#data-profiler) - [Lineage](#lineage) +- [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -289,11 +290,11 @@ source: {% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "snowflake"} /%} -{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "snowflake"} /%} +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "snowflake"} /%} -## Lineage +{% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "snowflake"} /%} -You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/sqlite/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/sqlite/yaml.md index d666ffa40282..6c542ed96d17 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/sqlite/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/sqlite/yaml.md @@ -36,6 +36,7 @@ Configure and schedule SQLite metadata and profiler workflows from the OpenMetad - [Metadata Ingestion](#metadata-ingestion) - [Query Usage](#query-usage) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [Lineage](#lineage) - [dbt Integration](#dbt-integration) @@ -181,6 +182,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "sqlite"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## Lineage You can learn more about how to ingest lineage [here](/connectors/ingestion/workflows/lineage). diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/trino/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/trino/yaml.md index b374287a70e0..0d76fb7f317e 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/trino/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/trino/yaml.md @@ -34,6 +34,7 @@ Configure and schedule Trino metadata and profiler workflows from the OpenMetada - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -212,6 +213,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "trino"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## SSL Configuration In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under **connectionArguments** which is placed in source. diff --git a/openmetadata-docs/content/v1.3.x/connectors/database/vertica/yaml.md b/openmetadata-docs/content/v1.3.x/connectors/database/vertica/yaml.md index d5774fd1aa48..5bddaf1485de 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/database/vertica/yaml.md +++ b/openmetadata-docs/content/v1.3.x/connectors/database/vertica/yaml.md @@ -34,6 +34,7 @@ Configure and schedule Vertica metadata and profiler workflows from the OpenMeta - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) - [Data Profiler](#data-profiler) +- [Data Quality](#data-quality) - [dbt Integration](#dbt-integration) {% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%} @@ -217,6 +218,8 @@ source: {% partial file="/v1.3/connectors/yaml/data-profiler.md" variables={connector: "vertica"} /%} +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} + ## dbt Integration {% tilesContainer %} diff --git a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/data-quality/index.md b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/data-quality/index.md index b5a07eb9b704..ce60045a996d 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/data-quality/index.md +++ b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/data-quality/index.md @@ -121,160 +121,7 @@ On the next page, you will be able to add existing test cases from different ent /%} -## Adding Tests with the YAML Config -When creating a JSON config for a test workflow the source configuration is very simple. -```yaml -source: - type: TestSuite - serviceName: - sourceConfig: - config: - type: TestSuite - entityFullyQualifiedName: -``` -The only sections you need to modify here are the `serviceName` (this name needs to be unique) and `entityFullyQualifiedName` (the entity for which we'll be executing tests against) keys. - -Once you have defined your source configuration you'll need to define te processor configuration. -```yaml -processor: - type: "orm-test-runner" - config: - forceUpdate: - testCases: - - name: - testDefinitionName: columnValueLengthsToBeBetween - columnName: - parameterValues: - - name: minLength - value: 10 - - name: maxLength - value: 25 - - name: - testDefinitionName: tableRowCountToEqual - parameterValues: - - name: value - value: 10 -``` -The processor type should be set to ` "orm-test-runner"`. For accepted test definition names and parameter value names refer to the [tests page](/connectors/ingestion/workflows/data-quality/tests). - -### Key reference: -- `forceUpdate`: if the test case exists (base on the test case name) for the entity, implements the strategy to follow when running the test (i.e. whether or not to update parameters) -- `testCases`: list of test cases to execute against the entity referenced -- `name`: test case name -- `testDefinitionName`: test definition -- `columnName`: only applies to column test. The name of the column to run the test against -- `parameterValues`: parameter values of the test - - -`sink` and `workflowConfig` will have the same settings than the ingestion and profiler workflow. - -### Full `yaml` config example - -```yaml -source: - type: TestSuite - serviceName: MyAwesomeTestSuite - sourceConfig: - config: - type: TestSuite - entityFullyQualifiedName: MySQL.default.openmetadata_db.tag_usage - -processor: - type: "orm-test-runner" - config: - forceUpdate: false - testCases: - - name: column_value_length_tagFQN - testDefinitionName: columnValueLengthsToBeBetween - columnName: tagFQN - parameterValues: - - name: minLength - value: 10 - - name: maxLength - value: 25 - - name: table_row_count_test - testDefinitionName: tableRowCountToEqual - parameterValues: - - name: value - value: 10 - -sink: - type: metadata-rest - config: {} -workflowConfig: - openMetadataServerConfig: - hostPort: - authProvider: -``` - -### How to Run Tests - -To run the tests from the CLI execute the following command -``` -metadata test -c /path/to/my/config.yaml -``` - -### Schedule Test Suite runs with Airflow - -As with the Ingestion or Profiler workflow, you can as well execute a Test Suite directly from Python. We are -going to use Airflow as an example, but any orchestrator would achieve the same goal. - -Let's prepare the DAG as usual, but importing a different Workflow class: - -```python -import pathlib -import yaml -from datetime import timedelta -from airflow import DAG - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from metadata.config.common import load_config_file -from metadata.workflow.data_quality import TestSuiteWorkflow -from metadata.workflow.workflow_output_handler import print_status -from airflow.utils.dates import days_ago - -default_args = { - "owner": "user_name", - "email": ["username@org.com"], - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(minutes=5), - "execution_timeout": timedelta(minutes=60) -} - -config = """ - -""" - -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = TestSuiteWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - -with DAG( - "test_suite_workflow", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - schedule_interval='*/5 * * * *', - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) -``` - -Note how we are using the `TestSuiteWorkflow` class to load and execute the tests based on the YAML -configurations specified above. +{% partial file="/v1.3/connectors/yaml/data-quality.md" /%} ## How to Visualize Test Results ### From the Quality Page @@ -334,7 +181,7 @@ The next step for a user is to mark the new failure as `ack` (acknowledged) sign caption="Test suite results table" /%} - Then user are able to mark a test as `resolved`. We made it mandatory for users to 1) select a reason and 2) add a comment when resolving failed test so that knowdledge can be maintain inside the platform. + Then user are able to mark a test as `resolved`. We made it mandatory for users to 1) select a reason and 2) add a comment when resolving failed test so that knowledge can be maintain inside the platform. {% image src="/images/v1.3/features/ingestion/workflows/data-quality/resolution-workflow-resolved-form.png.png" diff --git a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/lineage/index.md b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/lineage/index.md index 77cafbc076c1..23bf0b2cff95 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/lineage/index.md +++ b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/lineage/index.md @@ -71,7 +71,7 @@ Here you can enter the Lineage Ingestion details: **Query Log Duration** -Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 48 hours prior to when the ingestion workflow is run. +Specify the duration in days for which the lineage should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data lineage will capture lineage information for 48 hours prior to when the ingestion workflow is run. **Result Limit** @@ -88,3 +88,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be caption="View Service Ingestion pipelines" /%} +## YAML Configuration + +In the [connectors](/connectors) section we showcase how to run the metadata ingestion from a JSON/YAML file using the Airflow SDK or the CLI via metadata ingest. Running a lineage workflow is also possible using a JSON/YAML configuration file. + +This is a good option if you wish to execute your workflow via the Airflow SDK or using the CLI; if you use the CLI a lineage workflow can be triggered with the command `metadata ingest -c FILENAME.yaml`. The `serviceConnection` config will be specific to your connector (you can find more information in the [connectors](/connectors) section), though the sourceConfig for the lineage will be similar across all connectors. + +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%} diff --git a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/usage/index.md b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/usage/index.md index 59cd453f320c..4c9882e4f9fe 100644 --- a/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/usage/index.md +++ b/openmetadata-docs/content/v1.3.x/connectors/ingestion/workflows/usage/index.md @@ -72,7 +72,7 @@ Here you can enter the Usage Ingestion details: **Query Log Duration** -Specify the duration in days for which the profiler should capture usage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture usage information for 48 hours prior to when the ingestion workflow is run. +Specify the duration in days for which the usage should capture usage data from the query logs. For example, if you specify 2 as the value for the duration, the data usage will capture usage information for 48 hours prior to when the ingestion workflow is run. **Stage File Location** @@ -93,4 +93,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be caption="View Service Ingestion pipelines" /%} +## YAML Configuration +In the [connectors](/connectors) section we showcase how to run the metadata ingestion from a JSON/YAML file using the Airflow SDK or the CLI via metadata ingest. Running a usage workflow is also possible using a JSON/YAML configuration file. + +This is a good option if you wish to execute your workflow via the Airflow SDK or using the CLI; if you use the CLI a usage workflow can be triggered with the command `metadata usage -c FILENAME.yaml`. The `serviceConnection` config will be specific to your connector (you can find more information in the [connectors](/connectors) section), though the sourceConfig for the usage will be similar across all connectors. + +{% partial file="/v1.3/connectors/yaml/query-usage.md" variables={connector: "bigquery"} /%} \ No newline at end of file diff --git a/openmetadata-docs/content/v1.3.x/how-to-guides/data-lineage/workflow.md b/openmetadata-docs/content/v1.3.x/how-to-guides/data-lineage/workflow.md index 699301d6ba03..18c8b8879fe9 100644 --- a/openmetadata-docs/content/v1.3.x/how-to-guides/data-lineage/workflow.md +++ b/openmetadata-docs/content/v1.3.x/how-to-guides/data-lineage/workflow.md @@ -66,6 +66,10 @@ After clicking Next, you will be redirected to the Scheduling form. This will be caption="Schedule and Deploy the Lineage Ingestion" /%} +## Run Lineage Workflow Externally + +{% partial file="/v1.3/connectors/yaml/lineage.md" variables={connector: "bigquery"} /%} + ## dbt Ingestion We can also generate lineage through [dbt ingestion](/connectors/ingestion/workflows/dbt/ingest-dbt-ui). The dbt workflow can fetch queries that carry lineage information. For a dbt ingestion pipeline, the path to the Catalog and Manifest files must be specified. We also fetch the column level lineage through dbt.