-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MINOR: Added external workflow missing docs for usage/lineage (#15223)
* Added ext ing workflow docs * Added dq run ext docs * Nit * Nit --------- Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
- Loading branch information
Showing
31 changed files
with
394 additions
and
190 deletions.
There are no files selected for viewing
115 changes: 115 additions & 0 deletions
115
openmetadata-docs/content/partials/v1.3/connectors/yaml/data-quality.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
## Data Quality | ||
|
||
### Adding Data Quality Test Cases from yaml config | ||
|
||
When creating a JSON config for a test workflow the source configuration is very simple. | ||
```yaml | ||
source: | ||
type: TestSuite | ||
serviceName: <your_service_name> | ||
sourceConfig: | ||
config: | ||
type: TestSuite | ||
entityFullyQualifiedName: <entityFqn> | ||
``` | ||
The only sections you need to modify here are the `serviceName` (this name needs to be unique) and `entityFullyQualifiedName` (the entity for which we'll be executing tests against) keys. | ||
|
||
Once you have defined your source configuration you'll need to define te processor configuration. | ||
|
||
```yaml | ||
processor: | ||
type: "orm-test-runner" | ||
config: | ||
forceUpdate: <false|true> | ||
testCases: | ||
- name: <testCaseName> | ||
testDefinitionName: columnValueLengthsToBeBetween | ||
columnName: <columnName> | ||
parameterValues: | ||
- name: minLength | ||
value: 10 | ||
- name: maxLength | ||
value: 25 | ||
- name: <testCaseName> | ||
testDefinitionName: tableRowCountToEqual | ||
parameterValues: | ||
- name: value | ||
value: 10 | ||
``` | ||
|
||
The processor type should be set to ` "orm-test-runner"`. For accepted test definition names and parameter value names refer to the [tests page](/connectors/ingestion/workflows/data-quality/tests). | ||
|
||
{% note %} | ||
|
||
Note that while you can define tests directly in this YAML configuration, running the | ||
workflow will execute ALL THE TESTS present in the table, regardless of what you are defining in the YAML. | ||
|
||
This makes it easy for any user to contribute tests via the UI, while maintaining the test execution external. | ||
|
||
{% /note %} | ||
|
||
You can keep your YAML config as simple as follows if the table already has tests. | ||
|
||
```yaml | ||
processor: | ||
type: "orm-test-runner" | ||
config: {} | ||
``` | ||
|
||
### Key reference: | ||
|
||
- `forceUpdate`: if the test case exists (base on the test case name) for the entity, implements the strategy to follow when running the test (i.e. whether or not to update parameters) | ||
- `testCases`: list of test cases to add to the entity referenced. Note that we will execute all the tests present in the Table. | ||
- `name`: test case name | ||
- `testDefinitionName`: test definition | ||
- `columnName`: only applies to column test. The name of the column to run the test against | ||
- `parameterValues`: parameter values of the test | ||
|
||
|
||
The `sink` and `workflowConfig` will have the same settings as the ingestion and profiler workflow. | ||
|
||
### Full `yaml` config example | ||
|
||
```yaml | ||
source: | ||
type: TestSuite | ||
serviceName: MyAwesomeTestSuite | ||
sourceConfig: | ||
config: | ||
type: TestSuite | ||
entityFullyQualifiedName: MySQL.default.openmetadata_db.tag_usage | ||
processor: | ||
type: "orm-test-runner" | ||
config: | ||
forceUpdate: false | ||
testCases: | ||
- name: column_value_length_tagFQN | ||
testDefinitionName: columnValueLengthsToBeBetween | ||
columnName: tagFQN | ||
parameterValues: | ||
- name: minLength | ||
value: 10 | ||
- name: maxLength | ||
value: 25 | ||
- name: table_row_count_test | ||
testDefinitionName: tableRowCountToEqual | ||
parameterValues: | ||
- name: value | ||
value: 10 | ||
sink: | ||
type: metadata-rest | ||
config: {} | ||
workflowConfig: | ||
openMetadataServerConfig: | ||
hostPort: <OpenMetadata host and port> | ||
authProvider: <OpenMetadata auth provider> | ||
``` | ||
|
||
### How to Run Tests | ||
|
||
To run the tests from the CLI execute the following command | ||
``` | ||
metadata test -c /path/to/my/config.yaml | ||
``` |
167 changes: 167 additions & 0 deletions
167
openmetadata-docs/content/partials/v1.3/connectors/yaml/lineage.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
## Lineage | ||
|
||
After running a Metadata Ingestion workflow, we can run Lineage workflow. | ||
While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. | ||
|
||
|
||
### 1. Define the YAML Config | ||
|
||
This is a sample config for BigQuery Lineage: | ||
|
||
{% codePreview %} | ||
|
||
{% codeInfoContainer %} | ||
|
||
{% codeInfo srNumber=40 %} | ||
#### Source Configuration - Source Config | ||
|
||
You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json). | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=41 %} | ||
|
||
**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process lineage data in days. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=42 %} | ||
|
||
**parsingTimeoutLimit**: Configuration to set the timeout for parsing the query in seconds. | ||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=43 %} | ||
|
||
**filterCondition**: Condition to filter the query history. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=44 %} | ||
|
||
**resultLimit**: Configuration to set the limit for query logs. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=45 %} | ||
|
||
**queryLogFilePath**: Configuration to set the file path for query logs. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=46 %} | ||
|
||
**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=47 %} | ||
|
||
**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. | ||
|
||
{% /codeInfo %} | ||
|
||
{% codeInfo srNumber=48 %} | ||
|
||
**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. | ||
|
||
{% /codeInfo %} | ||
|
||
|
||
{% codeInfo srNumber=49 %} | ||
|
||
#### Sink Configuration | ||
|
||
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. | ||
{% /codeInfo %} | ||
|
||
|
||
{% codeInfo srNumber=50 %} | ||
|
||
#### Workflow Configuration | ||
|
||
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. | ||
|
||
For a simple, local installation using our docker containers, this looks like: | ||
|
||
{% /codeInfo %} | ||
|
||
{% /codeInfoContainer %} | ||
|
||
{% codeBlock fileName="filename.yaml" %} | ||
|
||
|
||
```yaml {% srNumber=40 %} | ||
source: | ||
type: {% $connector %}-lineage | ||
serviceName: <serviceName (same as metadata ingestion service name)> | ||
sourceConfig: | ||
config: | ||
type: DatabaseLineage | ||
``` | ||
```yaml {% srNumber=41 %} | ||
# Number of days to look back | ||
queryLogDuration: 1 | ||
``` | ||
```yaml {% srNumber=42 %} | ||
parsingTimeoutLimit: 300 | ||
``` | ||
```yaml {% srNumber=43 %} | ||
# filterCondition: query_text not ilike '--- metabase query %' | ||
``` | ||
```yaml {% srNumber=44 %} | ||
resultLimit: 1000 | ||
``` | ||
```yaml {% srNumber=45 %} | ||
# If instead of getting the query logs from the database we want to pass a file with the queries | ||
# queryLogFilePath: /tmp/query_log/file_path | ||
``` | ||
```yaml {% srNumber=46 %} | ||
# databaseFilterPattern: | ||
# includes: | ||
# - database1 | ||
# - database2 | ||
# excludes: | ||
# - database3 | ||
# - database4 | ||
``` | ||
```yaml {% srNumber=47 %} | ||
# schemaFilterPattern: | ||
# includes: | ||
# - schema1 | ||
# - schema2 | ||
# excludes: | ||
# - schema3 | ||
# - schema4 | ||
``` | ||
```yaml {% srNumber=48 %} | ||
# tableFilterPattern: | ||
# includes: | ||
# - table1 | ||
# - table2 | ||
# excludes: | ||
# - table3 | ||
# - table4 | ||
``` | ||
|
||
```yaml {% srNumber=49 %} | ||
sink: | ||
type: metadata-rest | ||
config: {} | ||
``` | ||
{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} | ||
{% /codeBlock %} | ||
{% /codePreview %} | ||
- You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from [here](/connectors/ingestion/workflows/lineage) | ||
### 2. Run with the CLI | ||
After saving the YAML config, we will run the command the same way we did for the metadata ingestion: | ||
```bash | ||
metadata ingest -c <path-to-yaml> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.