diff --git a/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md index 354480f1ba88..a4e86545b1ba 100644 --- a/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md +++ b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md @@ -111,6 +111,11 @@ Again, this information will be added on top of the inferred schema from the dat } ``` +{% /codeBlock %} + +{% /codePreview %} + + ### Global Manifest You can also manage a **single** manifest file to centralize the ingestion process for any container. In that case, diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config-def.md new file mode 100644 index 000000000000..4fcb9a0e9da7 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config-def.md @@ -0,0 +1,15 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): + +- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. +- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". +- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". +- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. +- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. +- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. +- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config.md new file mode 100644 index 000000000000..37ea0fa42c50 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/dashboard/source-config.md @@ -0,0 +1,30 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: DashboardMetadata + overrideOwner: True + # dbServiceNames: + # - service1 + # - service2 + # dashboardFilterPattern: + # includes: + # - dashboard1 + # - dashboard2 + # excludes: + # - dashboard3 + # - dashboard4 + # chartFilterPattern: + # includes: + # - chart1 + # - chart2 + # excludes: + # - chart3 + # - chart4 + # projectFilterPattern: + # includes: + # - project1 + # - project2 + # excludes: + # - project3 + # - project4 +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/data-profiler.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/data-profiler.md new file mode 100644 index 000000000000..1f04ad6b7210 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/data-profiler.md @@ -0,0 +1,224 @@ +## Data Profiler + +The Data Profiler workflow will be using the `orm-profiler` processor. + +After running a Metadata Ingestion workflow, we can run Data Profiler workflow. +While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. + + +### 1. Define the YAML Config + +This is a sample config for the profiler: + +{% codePreview %} + +{% codeInfoContainer %} + +{% codeInfo srNumber=13 %} +#### Source Configuration - Source Config + +You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). + +**generateSampleData**: Option to turn on/off generating sample data. + +{% /codeInfo %} + +{% codeInfo srNumber=14 %} + +**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. + +{% /codeInfo %} + +{% codeInfo srNumber=15 %} + +**threadCount**: Number of threads to use during metric computations. + +{% /codeInfo %} + +{% codeInfo srNumber=16 %} + +**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. + +{% /codeInfo %} + +{% codeInfo srNumber=17 %} + +**confidence**: Set the Confidence value for which you want the column to be marked + +{% /codeInfo %} + + +{% codeInfo srNumber=18 %} + +**timeoutSeconds**: Profiler Timeout in Seconds + +{% /codeInfo %} + +{% codeInfo srNumber=19 %} + +**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. + +{% /codeInfo %} + +{% codeInfo srNumber=20 %} + +**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. + +{% /codeInfo %} + +{% codeInfo srNumber=21 %} + +**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. + +{% /codeInfo %} + +{% codeInfo srNumber=22 %} + +#### Processor Configuration + +Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: + +**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. +{% /codeInfo %} + + +{% codeInfo srNumber=23 %} + +#### Sink Configuration + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. +{% /codeInfo %} + + +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.yaml" %} + + +```yaml +source: + type: {% $connector %} + serviceName: local_athena + sourceConfig: + config: + type: Profiler +``` + +```yaml {% srNumber=13 %} + generateSampleData: true +``` +```yaml {% srNumber=14 %} + # profileSample: 85 +``` +```yaml {% srNumber=15 %} + # threadCount: 5 +``` +```yaml {% srNumber=16 %} + processPiiSensitive: false +``` +```yaml {% srNumber=17 %} + # confidence: 80 +``` +```yaml {% srNumber=18 %} + # timeoutSeconds: 43200 +``` +```yaml {% srNumber=19 %} + # databaseFilterPattern: + # includes: + # - database1 + # - database2 + # excludes: + # - database3 + # - database4 +``` +```yaml {% srNumber=20 %} + # schemaFilterPattern: + # includes: + # - schema1 + # - schema2 + # excludes: + # - schema3 + # - schema4 +``` +```yaml {% srNumber=21 %} + # tableFilterPattern: + # includes: + # - table1 + # - table2 + # excludes: + # - table3 + # - table4 +``` + +```yaml {% srNumber=22 %} +processor: + type: orm-profiler + config: {} # Remove braces if adding properties + # tableConfig: + # - fullyQualifiedName: + # profileSample: # default + + # profileSample: # default will be 100 if omitted + # profileQuery: + # columnConfig: + # excludeColumns: + # - + # includeColumns: + # - columnName: + # - metrics: + # - MEAN + # - MEDIAN + # - ... + # partitionConfig: + # enablePartitioning: + # partitionColumnName: + # partitionIntervalType: + # Pick one of the variation shown below + # ----'TIME-UNIT' or 'INGESTION-TIME'------- + # partitionInterval: + # partitionIntervalUnit: + # ------------'INTEGER-RANGE'--------------- + # partitionIntegerRangeStart: + # partitionIntegerRangeEnd: + # -----------'COLUMN-VALUE'---------------- + # partitionValues: + # - + # - + +``` + +```yaml {% srNumber=23 %} +sink: + type: metadata-rest + config: {} +``` + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} + +{% /codeBlock %} + +{% /codePreview %} + +- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) + +### 2. Run with the CLI + +After saving the YAML config, we will run the command the same way we did for the metadata ingestion: + +```bash +metadata profile -c +``` + +Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. + +{% tilesContainer %} + +{% tile +title="Data Profiler" +description="Find more information about the Data Profiler here" +link="/connectors/ingestion/workflows/profiler" +/ %} + +{% /tilesContainer %} diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config-def.md new file mode 100644 index 000000000000..bdcb8a81ef48 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config-def.md @@ -0,0 +1,15 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): + +**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. + +**includeTables**: true or false, to ingest table data. Default is true. + +**includeViews**: true or false, to ingest views definitions. + +**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config.md new file mode 100644 index 000000000000..6046ebe9d35d --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/database/source-config.md @@ -0,0 +1,30 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: DatabaseMetadata + markDeletedTables: true + includeTables: true + includeViews: true + # includeTags: true + # databaseFilterPattern: + # includes: + # - database1 + # - database2 + # excludes: + # - database3 + # - database4 + # schemaFilterPattern: + # includes: + # - schema1 + # - schema2 + # excludes: + # - schema3 + # - schema4 + # tableFilterPattern: + # includes: + # - users + # - type_test + # excludes: + # - table3 + # - table4 +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-cli.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-cli.md new file mode 100644 index 000000000000..4e2102657977 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-cli.md @@ -0,0 +1,10 @@ +### 2. Run with the CLI + +First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: + +```bash +metadata ingest -c +``` + +Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, +you will be able to extract metadata from different sources. \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink-def.md new file mode 100644 index 000000000000..b79ed45da914 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink-def.md @@ -0,0 +1,7 @@ +#### Sink Configuration + +{% codeInfo srNumber=200 %} + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. + +{% /codeInfo %} diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink.md new file mode 100644 index 000000000000..5a908ab95726 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ingestion-sink.md @@ -0,0 +1,5 @@ +```yaml {% srNumber=200 %} +sink: + type: metadata-rest + config: {} +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config-def.md new file mode 100644 index 000000000000..43a77ddad71e --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config-def.md @@ -0,0 +1,11 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): + +**generateSampleData:** Option to turn on/off generating sample data during metadata extraction. + +**topicFilterPattern:** Note that the `topicFilterPattern` supports regex as include or exclude. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config.md new file mode 100644 index 000000000000..cb9bd5e145b0 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/messaging/source-config.md @@ -0,0 +1,11 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: MessagingMetadata + topicFilterPattern: + excludes: + - _confluent.* + # includes: + # - topic1 + # generateSampleData: true +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config-def.md new file mode 100644 index 000000000000..5675d97cf919 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config-def.md @@ -0,0 +1,9 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): + +**markDeletedMlModels**: Set the Mark Deleted Ml Models toggle to flag ml models as soft-deleted if they are not present anymore in the source system. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config.md new file mode 100644 index 000000000000..f15cd4b0b814 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/ml-model/source-config.md @@ -0,0 +1,6 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: MlModelMetadata + # markDeletedMlModels: true +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config-def.md new file mode 100644 index 000000000000..23663f17079a --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config-def.md @@ -0,0 +1,15 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): + +**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. + +**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. + +**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. + +**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config.md new file mode 100644 index 000000000000..b1afdcc1a725 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/pipeline/source-config.md @@ -0,0 +1,15 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: PipelineMetadata + # markDeletedPipelines: True + # includeTags: True + # includeLineage: true + # pipelineFilterPattern: + # includes: + # - pipeline1 + # - pipeline2 + # excludes: + # - pipeline3 + # - pipeline4 +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/query-usage.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/query-usage.md new file mode 100644 index 000000000000..9e7f3e370e09 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/query-usage.md @@ -0,0 +1,114 @@ +## Query Usage + +The Query Usage workflow will be using the `query-parser` processor. + +After running a Metadata Ingestion workflow, we can run Query Usage workflow. +While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. + + +### 1. Define the YAML Config + +This is a sample config for BigQuery Usage: + +{% codePreview %} + +{% codeInfoContainer %} + +{% codeInfo srNumber=25 %} + +#### Source Configuration - Source Config + +You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). + +**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. + +{% /codeInfo %} + +{% codeInfo srNumber=26 %} + +**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. + +{% /codeInfo %} + +{% codeInfo srNumber=27 %} + +**resultLimit**: Configuration to set the limit for query logs + +{% /codeInfo %} + +{% codeInfo srNumber=28 %} + +**queryLogFilePath**: Configuration to set the file path for query logs + +{% /codeInfo %} + + +{% codeInfo srNumber=29 %} + +#### Processor, Stage and Bulk Sink Configuration + +To specify where the staging files will be located. + +Note that the location is a directory that will be cleaned at the end of the ingestion. + +{% /codeInfo %} + +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.yaml" %} + +```yaml +source: + type: {% $connector %}-usage + serviceName: + sourceConfig: + config: + type: DatabaseUsage +``` +```yaml {% srNumber=25 %} + # Number of days to look back + queryLogDuration: 7 +``` + +```yaml {% srNumber=26 %} + # This is a directory that will be DELETED after the usage runs + stageFileLocation: +``` + +```yaml {% srNumber=27 %} + # resultLimit: 1000 +``` + +```yaml {% srNumber=28 %} + # If instead of getting the query logs from the database we want to pass a file with the queries + # queryLogFilePath: path-to-file +``` + +```yaml {% srNumber=29 %} +processor: + type: query-parser + config: {} +stage: + type: table-usage + config: + filename: /tmp/athena_usage +bulkSink: + type: metadata-usage + config: + filename: /tmp/athena_usage +``` + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} + +{% /codeBlock %} +{% /codePreview %} + +### 2. Run with the CLI + +After saving the YAML config, we will run the command the same way we did for the metadata ingestion: + +```bash +metadata usage -c +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config-def.md new file mode 100644 index 000000000000..b6c571a77aa3 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config-def.md @@ -0,0 +1,15 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/searchServiceMetadataPipeline.json): + +**includeSampleData**: Set the Ingest Sample Data toggle to control whether to ingest sample data as part of metadata ingestion. + +**sampleSize**: If include sample data is enabled, 10 records will be ingested by default. Using this field you can customize the size of sample data. + +**markDeletedSearchIndexes**: Optional configuration to soft delete `search indexes` in OpenMetadata if the source `search indexes` are deleted. After deleting, all the associated entities like lineage, etc., with that `search index` will be deleted. + +**searchIndexFilterPattern**: Note that the `searchIndexFilterPattern` support regex to include or exclude search indexes during metadata ingestion process. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config.md new file mode 100644 index 000000000000..f848a0fc654d --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/search/source-config.md @@ -0,0 +1,15 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: SearchMetadata + # markDeletedSearchIndexes: True + # includeSampleData: True + # sampleSize: 10 + # searchIndexFilterPattern: + # includes: + # - index1 + # - index2 + # excludes: + # - index4 + # - index3 +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config-def.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config-def.md new file mode 100644 index 000000000000..2fe9405484ce --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config-def.md @@ -0,0 +1,11 @@ +#### Source Configuration - Source Config + +{% codeInfo srNumber=100 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storageServiceMetadataPipeline.json): + +**containerFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database). + +**storageMetadataConfigSource**: Path to the `openmetadata_storage_manifest.json` global manifest file. It can be located in S3, a local path or as a URL to the file. + +{% /codeInfo %} \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config.md new file mode 100644 index 000000000000..f80f4d3fe749 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/storage/source-config.md @@ -0,0 +1,25 @@ +```yaml {% srNumber=100 %} + sourceConfig: + config: + type: StorageMetadata + # containerFilterPattern: + # includes: + # - container1 + # - container2 + # excludes: + # - container3 + # - container4 + # storageMetadataConfigSource: + ## For S3 + # securityConfig: + # awsAccessKeyId: ... + # awsSecretAccessKey: ... + # awsRegion: ... + # prefixConfig: + # containerName: om-glue-test + # objectPrefix: + ## For HTTP + # manifestHttpPath: http://... + ## For Local + # manifestFilePath: /path/to/openmetadata_storage_manifest.json +``` \ No newline at end of file diff --git a/openmetadata-docs/content/partials/v1.2/connectors/workflow-config.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config-def.md similarity index 97% rename from openmetadata-docs/content/partials/v1.2/connectors/workflow-config.md rename to openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config-def.md index 71f6cd5607c7..97e97599ddd6 100644 --- a/openmetadata-docs/content/partials/v1.2/connectors/workflow-config.md +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config-def.md @@ -1,6 +1,6 @@ #### Workflow Configuration -{% codeInfo srNumber=99 %} +{% codeInfo srNumber=300 %} The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. diff --git a/openmetadata-docs/content/partials/v1.2/connectors/workflow-config-yaml.md b/openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config.md similarity index 93% rename from openmetadata-docs/content/partials/v1.2/connectors/workflow-config-yaml.md rename to openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config.md index 1cd41649b2a8..c577f415feb1 100644 --- a/openmetadata-docs/content/partials/v1.2/connectors/workflow-config-yaml.md +++ b/openmetadata-docs/content/partials/v1.2/connectors/yaml/workflow-config.md @@ -1,4 +1,4 @@ -```yaml {% srNumber=99 %} +```yaml {% srNumber=300 %} workflowConfig: loggerLevel: INFO # DEBUG, INFO, WARNING or ERROR openMetadataServerConfig: diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/domo-dashboard/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/domo-dashboard/yaml.md index beeeae789bf9..7fa2debcf7e9 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/domo-dashboard/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/domo-dashboard/yaml.md @@ -94,31 +94,11 @@ This is a sample config for Domo-Dashboard: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=6 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=7 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -147,55 +127,15 @@ source: ```yaml {% srNumber=5 %} instanceDomain: https://.domo.com ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: DashboardMetadata - overrideOwner: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/looker/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/looker/yaml.md index d4b10afaf876..0eb93bad9959 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/looker/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/looker/yaml.md @@ -117,31 +117,11 @@ When configuring, give repository access to `Only select repositories` and choos {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -171,56 +151,15 @@ source: repositoryName: OpenMetadata token: XYZ ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: DashboardMetadata - overrideOwner: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/metabase/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/metabase/yaml.md index 5febbfea13d1..27f6433a5728 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/metabase/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/metabase/yaml.md @@ -83,31 +83,12 @@ This is a sample config for Metabase: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=4 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the Metabase dashboards and charts by projects (In case of Metabase, projects corresponds to Collections). Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=5 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} {% /codeInfoContainer %} @@ -130,56 +111,15 @@ source: ```yaml {% srNumber=3 %} hostPort: ``` -```yaml {% srNumber=4 %} - sourceConfig: - config: - type: DashboardMetadata - markDeletedDashboards: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=5 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/mode/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/mode/yaml.md index 03fd097b0a36..c5376e6d3b2f 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/mode/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/mode/yaml.md @@ -104,31 +104,11 @@ Name of the mode workspace from where the metadata is to be fetched. {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -154,55 +134,16 @@ source: ```yaml {% srNumber=4 %} workspace_name: workspace_name ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: DashboardMetadata - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/powerbi/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/powerbi/yaml.md index 6379c30ecc62..dd95572e1cb4 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/powerbi/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/powerbi/yaml.md @@ -199,31 +199,11 @@ For more information please visit the PowerBI official documentation [here](http {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the PowerBI dashboards, reports, tiles and data sources by projects(In case of PowerBI, projects correspond to workspaces). Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -262,54 +242,15 @@ source: ```yaml {% srNumber=8 %} # useAdminApis: true (default) ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DashboardMetadata - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/qliksense/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/qliksense/yaml.md index 38af801d3f95..f7d62adb5635 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/qliksense/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/qliksense/yaml.md @@ -138,31 +138,11 @@ You will have to replace new lines with `\n` and the final private key that you {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -202,54 +182,15 @@ source: ```yaml {% srNumber=6 %} userDirectory: user_dir ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DashboardMetadata - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/quicksight/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/quicksight/yaml.md index ac18df2fd7bd..c13917ab17a2 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/quicksight/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/quicksight/yaml.md @@ -138,31 +138,11 @@ This is a sample config for QuickSight: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -192,55 +172,15 @@ source: ```yaml {% srNumber=4 %} namespace: #to be provided if identityType is Anonymous ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: DashboardMetadata - markDeletedDashboards: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 - -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/redash/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/redash/yaml.md index 67f3b7f8d831..d8ed42040b69 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/redash/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/redash/yaml.md @@ -85,31 +85,11 @@ Can be found on a user profile page. {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -135,54 +115,15 @@ source: ```yaml {% srNumber=4 %} redashVersion: 10.0.0 ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: DashboardMetadata - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/superset/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/superset/yaml.md index c6e37cee9063..30180f93b3ba 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/superset/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/superset/yaml.md @@ -134,31 +134,11 @@ You can use Postgres Connection when you have SSO enabled and your Superset is b {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/dashboard/source-config-def.md" /%} -{% codeInfo srNumber=4 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/dashboardServiceMetadataPipeline.json): - -- **dbServiceNames**: Database Service Names for ingesting lineage if the source supports it. -- **dashboardFilterPattern**, **chartFilterPattern**, **dataModelFilterPattern**: Note that all of them support regex as include or exclude. E.g., "My dashboard, My dash.*, .*Dashboard". -- **projectFilterPattern**: Filter the dashboards, charts and data sources by projects. Note that all of them support regex as include or exclude. E.g., "My project, My proj.*, .*Project". -- **includeOwners**: Set the 'Include Owners' toggle to control whether to include owners to the ingested entity if the owner email matches with a user stored in the OM server as part of metadata ingestion. If the ingested entity already exists and has an owner, the owner will not be overwritten. -- **includeTags**: Set the 'Include Tags' toggle to control whether to include tags in metadata ingestion. -- **includeDataModels**: Set the 'Include Data Models' toggle to control whether to include tags as part of metadata ingestion. -- **markDeletedDashboards**: Set the 'Mark Deleted Dashboards' toggle to flag dashboards as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=5 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -198,57 +178,17 @@ source: # hostPort: localhost:5432 # database: superset ``` -```yaml {% srNumber=4 %} - sourceConfig: - config: - type: DashboardMetadata - overrideOwner: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=5 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI +{% /codeBlock %} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codePreview %} -```bash -metadata ingest -c -``` -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/dashboard/tableau/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/dashboard/tableau/yaml.md index 0925e2fcd845..766f34941dc4 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/dashboard/tableau/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/dashboard/tableau/yaml.md @@ -146,7 +146,7 @@ To send the metadata to OpenMetadata, it needs to be specified as `type: metadat {% /codeInfo %} -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -188,53 +188,12 @@ source: ```yaml {% srNumber=11 %} paginationLimit: pagination_limit ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DashboardMetadata - includeOwners: True - markDeletedDashboards: True - includeTags: True - includeDataModels: True - # dbServiceNames: - # - service1 - # - service2 - # dashboardFilterPattern: - # includes: - # - dashboard1 - # - dashboard2 - # excludes: - # - dashboard3 - # - dashboard4 - # chartFilterPattern: - # includes: - # - chart1 - # - chart2 - # excludes: - # - chart3 - # - chart4 - # dataModelFilterPattern: - # includes: - # - datamodel1 - # - datamodel2 - # excludes: - # - datamodel3 - # - datamodel4 - # projectFilterPattern: - # includes: - # - project1 - # - project2 - # excludes: - # - project3 - # - project4 -``` -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/dashboard/source-config.md" /%} + +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} @@ -363,13 +322,4 @@ workflowConfig: ``` -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/athena/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/athena/yaml.md index a5609f794919..f6d917c2475c 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/athena/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/athena/yaml.md @@ -285,32 +285,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=13 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=14 %} - - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -379,417 +358,21 @@ source: # key: value ``` -```yaml {% srNumber=13 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=14 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for BigQuery Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=25 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=28 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=29 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: athena-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=25 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=26 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=27 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=28 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=29 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/athena_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/athena_usage -``` - -```yaml {% srNumber=30 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=13 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=18 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=23 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=24 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: athena - serviceName: local_athena - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=13 %} - generateSampleData: true -``` -```yaml {% srNumber=14 %} - # profileSample: 85 -``` -```yaml {% srNumber=15 %} - # threadCount: 5 -``` -```yaml {% srNumber=16 %} - processPiiSensitive: false -``` -```yaml {% srNumber=17 %} - # confidence: 80 -``` -```yaml {% srNumber=18 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=19 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=20 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=21 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=22 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=23 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=24 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "athena"} /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "athena"} /%} ## Lineage @@ -806,15 +389,3 @@ You can learn more about how to ingest lineage [here](/connectors/ingestion/work link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/athena/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/azuresql/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/azuresql/yaml.md index cb540694893d..93bd33cdc448 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/azuresql/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/azuresql/yaml.md @@ -55,9 +55,6 @@ CREATE USER Mary WITH PASSWORD = '********'; GRANT SELECT TO Mary; ``` - - - ### Python Requirements To run the AzureSQL ingestion, you will need to install: @@ -121,33 +118,11 @@ You can download the ODBC driver from [here](https://learn.microsoft.com/en-us/s {% /codeInfo %} -#### Source Configuration - Source Config - -{% codeInfo srNumber=8 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -**includeTables**: true or false, to ingest table data. Default is true. - -**ingestAllDatabases**: Ingest data from all databases in Azuresql. You can use databaseFilterPattern on top of this. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -201,288 +176,19 @@ source: # key: value ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: azuresql - serviceName: local_azuresql - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=11 %} - generateSampleData: true -``` -```yaml {% srNumber=12 %} - # profileSample: 85 -``` -```yaml {% srNumber=13 %} - # threadCount: 5 -``` -```yaml {% srNumber=14 %} - processPiiSensitive: false -``` -```yaml {% srNumber=15 %} - # confidence: 80 -``` -```yaml {% srNumber=16 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=17 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=18 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=19 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=20 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=21 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=22 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "azuresql"} /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/bigquery/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/bigquery/yaml.md index 0ebbea4024f4..913c36736b1b 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/bigquery/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/bigquery/yaml.md @@ -169,31 +169,11 @@ the GCP credentials empty. This is why they are not marked as required. {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=4 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=5 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -252,537 +232,21 @@ source: # key: value ``` -```yaml {% srNumber=4 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=5 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for BigQuery Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=7 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=8 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=9 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=10 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=11 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: bigquery-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=7 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=8 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=9 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=10 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=11 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/bigquery_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/bigquery_usage -``` - -```yaml {% srNumber=12 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[bigquery-usage]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=13 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=18 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=23 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=24 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: bigquery - serviceName: local_bigquery - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=13 %} - generateSampleData: true -``` -```yaml {% srNumber=14 %} - # profileSample: 85 -``` -```yaml {% srNumber=15 %} - # threadCount: 5 -``` -```yaml {% srNumber=16 %} - processPiiSensitive: false -``` -```yaml {% srNumber=17 %} - # confidence: 80 -``` -```yaml {% srNumber=18 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=19 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=20 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=21 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=22 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -``` - -```yaml {% srNumber=23 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=24 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=25 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=27 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=28 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% /codeInfo %} - -{% codeInfo srNumber=29 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=26 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=27 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=28 %} -config = """ - -""" - - -``` - -```python {% srNumber=29 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=30 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "bigquery"} /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "bigquery"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/clickhouse/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/clickhouse/yaml.md index 2a027a4c5a54..e43472beeb56 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/clickhouse/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/clickhouse/yaml.md @@ -170,31 +170,11 @@ This is a sample config for Clickhouse: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -259,538 +239,22 @@ source: # connectionArguments: # key: value ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for Clickhouse Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=12 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: clickhouse-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=12 %} - # Number of days to look back - queryLogDuration: 7 -``` -```yaml {% srNumber=13 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=14 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=15 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=16 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/clickhouse_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/clickhouse_usage -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=17 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[clickhouse-usage]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=18 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=23 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=29 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: clickhouse - serviceName: local_clickhouse - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=18 %} - generateSampleData: true -``` -```yaml {% srNumber=19 %} - # profileSample: 85 -``` -```yaml {% srNumber=20 %} - # threadCount: 5 -``` -```yaml {% srNumber=21 %} - processPiiSensitive: false -``` -```yaml {% srNumber=22 %} - # confidence: 80 -``` -```yaml {% srNumber=23 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=24 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=25 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=26 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=27 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=28 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=29 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=30 %} - -#### Import necessary modules +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=31 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=32 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "clickhouse"} /%} -{% codeInfo srNumber=34 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=30 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=31 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=32 %} -config = """ - -""" - - -``` - -```python {% srNumber=33 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=34 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/couchbase/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/couchbase/yaml.md index bf0dcca2bbac..bb88222c4231 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/couchbase/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/couchbase/yaml.md @@ -98,31 +98,11 @@ This is a sample config for Couchbase: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilternPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -144,7 +124,7 @@ source: serviceConnection: config: type: Couchbase - +``` ```yaml {% srNumber=1 %} username: username ``` @@ -159,59 +139,17 @@ source: bucket: custom_bucket_name ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## Related diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/databricks/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/databricks/yaml.md index dac26982f82f..1822457b582b 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/databricks/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/databricks/yaml.md @@ -122,31 +122,11 @@ This is a sample config for Databricks: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -207,538 +187,21 @@ source: ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - -**Note**: To get Query Usage and Lineage details, need a Azure Databricks Premium account. - -### 1. Define the YAML Config - -This is a sample config for Databricks Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=12 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: databricks-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=12 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=13 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=14 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=15 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=16 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/databricks_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/databricks_usage -``` - -```yaml {% srNumber=17 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[databricks]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=18 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% codeInfo srNumber=23 %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "databricks"} /%} -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=29 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: databricks - serviceName: local_databricks - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=18 %} - generateSampleData: true -``` -```yaml {% srNumber=19 %} - # profileSample: 85 -``` -```yaml {% srNumber=20 %} - # threadCount: 5 -``` -```yaml {% srNumber=21 %} - processPiiSensitive: false -``` -```yaml {% srNumber=22 %} - # confidence: 80 -``` -```yaml {% srNumber=23 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=24 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=25 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=26 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=27 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=28 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=29 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=30 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=31 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=32 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=34 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=30 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=31 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=32 %} -config = """ - -""" - - -``` - -```python {% srNumber=33 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=34 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "databricks"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/datalake/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/datalake/yaml.md index 4c6e7e73fa29..fbe7892bec1b 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/datalake/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/datalake/yaml.md @@ -137,31 +137,11 @@ The workflow is modeled around the following JSON Schema. {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=2 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=3 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -185,44 +165,12 @@ source: bucketName: bucket name prefix: prefix ``` -```yaml {% srNumber=2 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` -```yaml {% srNumber=3 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} + +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} @@ -255,31 +203,11 @@ sink: {% /codeInfo %} -#### Source Configuration - Source Config - -{% codeInfo srNumber=6 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=7 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -310,44 +238,12 @@ source: bucketName: bucket name prefix: prefix ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} @@ -371,31 +267,11 @@ sink: {% /codeInfo %} -#### Source Configuration - Source Config - -{% codeInfo srNumber=10 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=11 %} +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -419,59 +295,18 @@ source: accountName: account-name prefix: prefix ``` -```yaml {% srNumber=10 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` -```yaml {% srNumber=11 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/db2/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/db2/yaml.md index 47d109bd9994..57cd7fd6128f 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/db2/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/db2/yaml.md @@ -126,31 +126,11 @@ This is a sample config for DB2: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -202,288 +182,19 @@ source: ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=11 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: db2 - serviceName: local_db2 - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=10 %} - generateSampleData: true -``` -```yaml {% srNumber=11 %} - # profileSample: 85 -``` -```yaml {% srNumber=12 %} - # threadCount: 5 -``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` -```yaml {% srNumber=15 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=16 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=17 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=18 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=19 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -``` - -```yaml {% srNumber=20 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=21 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "db2"} /%} ## dbt Integration @@ -496,15 +207,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/db2/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/deltalake/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/deltalake/yaml.md index 3b19a65162f6..e681c3f42fa1 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/deltalake/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/deltalake/yaml.md @@ -132,31 +132,11 @@ You will need to provide the driver to the ingestion image, and pass the `classp {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=4 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=5 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -210,60 +190,17 @@ source: # key: value ``` -```yaml {% srNumber=4 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=5 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## dbt Integration @@ -276,15 +213,3 @@ you will be able to extract metadata from different sources. link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/deltalake/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/domo-database/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/domo-database/yaml.md index 731d683f6c28..7b1eb689f207 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/domo-database/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/domo-database/yaml.md @@ -118,31 +118,11 @@ This is a sample config for DomoDatabase: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -201,59 +181,17 @@ source: ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## dbt Integration @@ -266,15 +204,3 @@ you will be able to extract metadata from different sources. link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/domo-database/airflow" - / %} - -{% /tilesContainer %} \ No newline at end of file diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/druid/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/druid/yaml.md index a47d4b9b009c..786f6827e49a 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/druid/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/druid/yaml.md @@ -100,31 +100,11 @@ This is a sample config for Druid: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -176,289 +156,20 @@ source: ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=11 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: druid - serviceName: local_druid - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=10 %} - generateSampleData: true -``` -```yaml {% srNumber=11 %} - # profileSample: 85 -``` -```yaml {% srNumber=12 %} - # threadCount: 5 -``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` -```yaml {% srNumber=15 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=16 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=17 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=18 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=19 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=20 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=21 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "druid"} /%} ## dbt Integration @@ -471,15 +182,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/druid/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/dynamodb/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/dynamodb/yaml.md index 07b4482d237b..86b507765e79 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/dynamodb/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/dynamodb/yaml.md @@ -133,31 +133,11 @@ This is a sample config for DynamoDB: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -215,59 +195,18 @@ source: # key: value ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} + +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## dbt Integration @@ -281,15 +220,3 @@ you will be able to extract metadata from different sources. link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/dynamodb/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/glue/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/glue/yaml.md index c577b2f1dfc3..c707b9d3dcfa 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/glue/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/glue/yaml.md @@ -114,31 +114,11 @@ This is a sample config for Glue: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -197,59 +177,17 @@ source: ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## dbt Integration @@ -263,15 +201,3 @@ you will be able to extract metadata from different sources. link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/azuresql/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/greenplum/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/greenplum/yaml.md index 7c048d682609..926214b13459 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/greenplum/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/greenplum/yaml.md @@ -179,31 +179,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -265,392 +245,19 @@ source: # key: value ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=17 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=27 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: greenplum - serviceName: local_greenplum - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=17 %} - generateSampleData: true -``` -```yaml {% srNumber=18 %} - # profileSample: 85 -``` -```yaml {% srNumber=19 %} - # threadCount: 5 -``` -```yaml {% srNumber=20 %} - processPiiSensitive: false -``` -```yaml {% srNumber=21 %} - # confidence: 80 -``` -```yaml {% srNumber=22 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=23 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=24 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=25 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=26 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionInterval: - # partitionIntervalUnit: +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -``` - -```yaml {% srNumber=27 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=28 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=29 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=31 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=32 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=30 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=31 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=32 %} -config = """ - -""" - - -``` - -```python {% srNumber=33 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=34 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "greenplum"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/hive/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/hive/yaml.md index cd6a233615c0..f071444020a5 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/hive/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/hive/yaml.md @@ -126,31 +126,11 @@ You can also ingest the metadata using Postgres metastore. This step is optional {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -222,287 +202,18 @@ source: # key: value ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=10 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=11 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: hive - serviceName: local_hive - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=10 %} - generateSampleData: true -``` -```yaml {% srNumber=11 %} - # profileSample: 85 -``` -```yaml {% srNumber=12 %} - # threadCount: 5 -``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` -```yaml {% srNumber=15 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=16 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=17 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=18 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=19 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=20 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=21 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "hive"} /%} ## dbt Integration @@ -515,15 +226,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/hive/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/impala/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/impala/yaml.md index f0e0d2346238..037f21f9fb96 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/impala/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/impala/yaml.md @@ -109,27 +109,11 @@ This is a sample config for Hive: {% codeInfo srNumber=8 %} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -181,287 +165,19 @@ source: # key: value ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=10 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% /codeInfo %} - -{% codeInfo srNumber=11 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: impala - serviceName: local_impala - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=10 %} - generateSampleData: true -``` -```yaml {% srNumber=11 %} - # profileSample: 85 -``` -```yaml {% srNumber=12 %} - # threadCount: 5 -``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` -```yaml {% srNumber=15 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=16 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=17 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=18 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=19 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=20 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=21 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "impala"} /%} ## dbt Integration @@ -474,15 +190,3 @@ description="Learn more about how to ingest dbt models' definitions and their li link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile -title="Ingest with Airflow" -description="Configure the ingestion using Airflow SDK" -link="/connectors/database/impala/airflow" -/ %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/mariadb/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/mariadb/yaml.md index 09ae6de3eec0..f937392d918f 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/mariadb/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/mariadb/yaml.md @@ -103,31 +103,11 @@ The workflow is modeled around the following {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=8 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -181,289 +161,19 @@ source: # key: value ``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: mariadb - serviceName: local_mariadb - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=11 %} - generateSampleData: true -``` -```yaml {% srNumber=12 %} - # profileSample: 85 -``` -```yaml {% srNumber=13 %} - # threadCount: 5 -``` -```yaml {% srNumber=14 %} - processPiiSensitive: false -``` -```yaml {% srNumber=15 %} - # confidence: 80 -``` -```yaml {% srNumber=16 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=17 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=18 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=19 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=20 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=21 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=22 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "mariadb"} /%} ## dbt Integration @@ -476,15 +186,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/mariadb/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/mongodb/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/mongodb/yaml.md index 17a7ee07c4cb..7a0d989702dc 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/mongodb/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/mongodb/yaml.md @@ -106,31 +106,11 @@ This is a sample config for MongoDB: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -174,68 +154,26 @@ source: database: custom_database_name ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -## Related +## dbt Integration {% tilesContainer %} {% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/mongodb/airflow" - / %} +icon="mediation" +title="dbt Integration" +description="Learn more about how to ingest dbt models' definitions and their lineage." +link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/mssql/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/mssql/yaml.md index c6a2e98212eb..c4cc96e93718 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/mssql/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/mssql/yaml.md @@ -131,31 +131,11 @@ This is a sample config for MSSQL: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -224,538 +204,21 @@ source: # key: value ``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for MSSQL Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=12 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: mssql-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=12 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=13 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=14 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=15 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=16 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/mssql_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/mssql_usage -``` - -```yaml {% srNumber=17 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[mssql-usage]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=18 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "mssql"} /%} -{% codeInfo srNumber=23 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=29 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: mssql - serviceName: local_mssql - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=18 %} - generateSampleData: true -``` -```yaml {% srNumber=19 %} - # profileSample: 85 -``` -```yaml {% srNumber=20 %} - # threadCount: 5 -``` -```yaml {% srNumber=21 %} - processPiiSensitive: false -``` -```yaml {% srNumber=22 %} - # confidence: 80 -``` -```yaml {% srNumber=23 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=24 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=25 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=26 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=27 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=28 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=29 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=30 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=31 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=32 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=34 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=35 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=36 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=37 %} -config = """ - -""" - - -``` - -```python {% srNumber=38 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=39 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "mssql"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/mysql/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/mysql/yaml.md index 3fd0cb1558e8..ad3d04ecb01e 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/mysql/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/mysql/yaml.md @@ -172,32 +172,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=8 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -256,337 +235,20 @@ source: # connectionArguments: # key: value ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=15 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=25 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=26 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: mysql - serviceName: - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=15 %} - generateSampleData: true -``` -```yaml {% srNumber=16 %} - # profileSample: 85 -``` -```yaml {% srNumber=17 %} - # threadCount: 5 -``` -```yaml {% srNumber=18 %} - processPiiSensitive: false -``` -```yaml {% srNumber=19 %} - # confidence: 80 -``` -```yaml {% srNumber=20 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=21 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=22 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=23 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` -```yaml {% srNumber=24 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=25 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=26 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. - -## SSL Configuration - -In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under connectionArguments which is placed in the source. - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=27 %} - -**ssl**: A dict of arguments which contains: - - **ssl_ca**: Path to the file that contains a PEM-formatted CA certificate. - - **ssl_cert**: Path to the file that contains a PEM-formatted client certificate. - - **ssl_disabled**: A boolean value that disables usage of TLS. - - **ssl_key**: Path to the file that contains a PEM-formatted private key for the client certificate. - - **ssl_verify_cert**: Set to true to check the server certificate's validity. - - **ssl_verify_identity**: Set to true to check the server's identity. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml {% srNumber=27 %} -source: - type: mysql - serviceName: "" - serviceConnection: - config: - type: Mysql - username: - password: - hostPort: - ... - ... - connectionArguments: - ssl: - ssl_ca: /path/to/client-ssl/ca.pem, - ssl_cert: /path/to/client-ssl/client-cert.pem - ssl_key: /path/to/client-ssl/client-key.pem - #ssl_disabled: True #boolean - #ssl_verify_cert: True #boolean - #ssl_verify_identity: True #boolean - -``` -{% /codeBlock %} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "mysql"} /%} ## dbt Integration @@ -599,15 +261,3 @@ source: link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/mysql/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/oracle/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/oracle/yaml.md index 2b67085654ff..76bdcf39c330 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/oracle/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/oracle/yaml.md @@ -142,31 +142,11 @@ This is a sample config for Oracle: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -222,288 +202,20 @@ source: # connectionArguments: # key: value ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: oracle - serviceName: local_oracle - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=11 %} - generateSampleData: true -``` -```yaml {% srNumber=12 %} - # profileSample: 85 -``` -```yaml {% srNumber=13 %} - # threadCount: 5 -``` -```yaml {% srNumber=14 %} - processPiiSensitive: false -``` -```yaml {% srNumber=15 %} - # confidence: 80 -``` -```yaml {% srNumber=16 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=17 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=18 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=19 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=20 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=21 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=22 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "oracle"} /%} ## Lineage @@ -521,14 +233,3 @@ You can learn more about how to ingest lineage [here](/connectors/ingestion/work {% /tilesContainer %} -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/oracle/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/pinotdb/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/pinotdb/yaml.md index bcb7d7a21c19..c1b4a7ed90fa 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/pinotdb/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/pinotdb/yaml.md @@ -100,32 +100,11 @@ This is a sample config for PinotDB: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -175,337 +154,20 @@ source: # connectionArguments: # key: value ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=15 %} -#### Source Configuration - Source Config -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} - -{% codeInfo srNumber=25 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=26 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: pinotdb - serviceName: - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=15 %} - generateSampleData: true -``` -```yaml {% srNumber=16 %} - # profileSample: 85 -``` -```yaml {% srNumber=17 %} - # threadCount: 5 -``` -```yaml {% srNumber=18 %} - processPiiSensitive: false -``` -```yaml {% srNumber=19 %} - # confidence: 80 -``` -```yaml {% srNumber=20 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=21 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=22 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=23 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=24 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=25 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=26 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. - -## SSL Configuration - -In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under connectionArguments which is placed in the source. - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=27 %} - -**ssl**: A dict of arguments which contains: - - **ssl_ca**: Path to the file that contains a PEM-formatted CA certificate. - - **ssl_cert**: Path to the file that contains a PEM-formatted client certificate. - - **ssl_disabled**: A boolean value that disables usage of TLS. - - **ssl_key**: Path to the file that contains a PEM-formatted private key for the client certificate. - - **ssl_verify_cert**: Set to true to check the server certificate's validity. - - **ssl_verify_identity**: Set to true to check the server's identity. - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml {% srNumber=27 %} -source: - type: pinotdb - serviceName: "" - serviceConnection: - config: - type: PinotDB - username: - password: - hostPort: - ... - ... - connectionArguments: - ssl: - ssl_ca: /path/to/client-ssl/ca.pem, - ssl_cert: /path/to/client-ssl/client-cert.pem - ssl_key: /path/to/client-ssl/client-key.pem - #ssl_disabled: True #boolean - #ssl_verify_cert: True #boolean - #ssl_verify_identity: True #boolean - -``` -{% /codeBlock %} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "pinotdb"} /%} ## dbt Integration @@ -518,15 +180,3 @@ source: link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/pinotdb/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/postgres/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/postgres/yaml.md index c0c1f171e4cb..d46ccef8cb6c 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/postgres/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/postgres/yaml.md @@ -201,31 +201,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -287,537 +267,21 @@ source: # key: value ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for Postgres Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: postgres-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=11 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=12 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=13 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=14 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=15 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/postgres_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/postgres_usage -``` - -```yaml {% srNumber=16 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[postgres]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=17 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% codeInfo srNumber=22 %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "postgres"} /%} -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=27 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: postgres - serviceName: local_postgres - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=17 %} - generateSampleData: true -``` -```yaml {% srNumber=18 %} - # profileSample: 85 -``` -```yaml {% srNumber=19 %} - # threadCount: 5 -``` -```yaml {% srNumber=20 %} - processPiiSensitive: false -``` -```yaml {% srNumber=21 %} - # confidence: 80 -``` -```yaml {% srNumber=22 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=23 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=24 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=25 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=26 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=27 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=28 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=29 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=31 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=32 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=30 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=31 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=32 %} -config = """ - -""" - - -``` - -```python {% srNumber=33 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=34 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "postgres"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/presto/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/presto/yaml.md index b339758e06d9..d5444ae25a05 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/presto/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/presto/yaml.md @@ -105,31 +105,11 @@ This is a sample config for Presto: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=8 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -184,288 +164,19 @@ source: # key: value ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: presto - serviceName: local_presto - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=11 %} - generateSampleData: true -``` -```yaml {% srNumber=12 %} - # profileSample: 85 -``` -```yaml {% srNumber=13 %} - # threadCount: 5 -``` -```yaml {% srNumber=14 %} - processPiiSensitive: false -``` -```yaml {% srNumber=15 %} - # confidence: 80 -``` -```yaml {% srNumber=16 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=17 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=18 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=19 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=20 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=21 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=22 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "presto"} /%} ## dbt Integration diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/redshift/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/redshift/yaml.md index 2e32f438d1a1..434d724c70a4 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/redshift/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/redshift/yaml.md @@ -129,31 +129,11 @@ This is a sample config for Redshift: -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=8 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=9 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -207,594 +187,22 @@ source: # key: value ``` -```yaml {% srNumber=8 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=9 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for Redshift Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: redshift-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=11 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=12 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=13 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=14 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=15 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/redshift_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/redshift_usage -``` - -```yaml {% srNumber=16 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[redshift-usage]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=17 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=26 %} - -#### Processor Configuration +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=27 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=28 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: redshift - serviceName: local_redshift - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=17 %} - generateSampleData: true -``` -```yaml {% srNumber=18 %} - # profileSample: 85 -``` -```yaml {% srNumber=19 %} - # threadCount: 5 -``` -```yaml {% srNumber=20 %} - processPiiSensitive: false -``` -```yaml {% srNumber=21 %} - # confidence: 80 -``` -```yaml {% srNumber=22 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=23 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=24 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=25 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=26 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=27 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=28 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=29 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=31 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=32 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=33 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=29 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=30 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=31 %} -config = """ - -""" - - -``` - -```python {% srNumber=32 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=33 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} - - -## SSL Configuration - -In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under connectionArguments which is placed in the source. - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=34 %} - -### SSL Modes - -There are couple of types of SSL modes that Redshift supports which can be added to ConnectionArguments, they are as follows: -- **disable**: SSL is disabled and the connection is not encrypted. - -- **allow**: SSL is used if the server requires it. - -- **prefer**: SSL is used if the server supports it. Amazon Redshift supports SSL, so SSL is used when you set sslmode to prefer. - -- **require**: SSL is required. - -- **verify-ca**: SSL must be used and the server certificate must be verified. - -- **verify-full**: SSL must be used. The server certificate must be verified and the server hostname must match the hostname attribute on the certificate. - -For more information, you can visit [Redshift SSL documentation](https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html) - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml {% srNumber=34 %} -source: - type: redshift - serviceName: - serviceConnection: - config: - type: Redshift - hostPort: cluster.name.region.redshift.amazonaws.com:5439 - username: username - ... - ... - ... - connectionArguments: - sslmode: - - - - -``` -{% /codeBlock %} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "redshift"} /%} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "redshift"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/salesforce/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/salesforce/yaml.md index d95153329716..6c910684031c 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/salesforce/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/salesforce/yaml.md @@ -116,31 +116,11 @@ By default, the domain `login` is used for accessing Salesforce. {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=10 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=11 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -197,69 +177,14 @@ source: # key: value ``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=11 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/salesforce/airflow" - / %} - -{% /tilesContainer %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/sap-hana/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/sap-hana/yaml.md index 2f8d289166e9..8684c932e3d5 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/sap-hana/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/sap-hana/yaml.md @@ -129,33 +129,11 @@ If you have a User Store configured, then: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=9 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration {% codeInfo srNumber=7 %} @@ -213,288 +191,20 @@ source: # connectionArguments: # key: value ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -This is a sample config for the profiler: +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=15 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=25 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=26 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: sapHana - serviceName: - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=15 %} - generateSampleData: true -``` -```yaml {% srNumber=16 %} - # profileSample: 85 -``` -```yaml {% srNumber=17 %} - # threadCount: 5 -``` -```yaml {% srNumber=18 %} - processPiiSensitive: false -``` -```yaml {% srNumber=19 %} - # confidence: 80 -``` -```yaml {% srNumber=20 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=21 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=22 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=23 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=24 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=25 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=26 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "sapHana"} /%} ## dbt Integration @@ -507,15 +217,3 @@ Note how instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/sap-hana/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/singlestore/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/singlestore/yaml.md index 261beac75926..6a47a5b01f83 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/singlestore/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/singlestore/yaml.md @@ -100,31 +100,11 @@ This is a sample config for Singlestore: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -176,288 +156,19 @@ source: # key: value ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=11 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=16 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=22 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: singlestore - serviceName: local_singlestore - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=11 %} - generateSampleData: true -``` -```yaml {% srNumber=12 %} - # profileSample: 85 -``` -```yaml {% srNumber=13 %} - # threadCount: 5 -``` -```yaml {% srNumber=14 %} - processPiiSensitive: false -``` -```yaml {% srNumber=15 %} - # confidence: 80 -``` -```yaml {% srNumber=16 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=17 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=18 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=19 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=20 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -``` - -```yaml {% srNumber=21 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=22 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "singlestore"} /%} ## dbt Integration @@ -470,15 +181,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/singlestore/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/snowflake/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/snowflake/yaml.md index 899587254cbb..04f1aa36f204 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/snowflake/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/snowflake/yaml.md @@ -198,31 +198,11 @@ This is a sample config for Snowflake: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=12 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=13 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -291,539 +271,23 @@ source: # key: value ``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=12 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=13 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for Snowflake Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=15 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=19 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: snowflake-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=15 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=16 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=17 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=18 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=19 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/snowflake_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/snowflake_usage -``` - -```yaml {% srNumber=20 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -```bash -pip3 install --upgrade 'openmetadata-ingestion[snowflake-usage]' -``` - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=21 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/query-usage.md" variables={connector: "snowflake"} /%} -{% codeInfo srNumber=26 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=28 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=29 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=31 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=32 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: snowflake - serviceName: local_snowflake - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=21 %} - generateSampleData: true -``` -```yaml {% srNumber=22 %} - # profileSample: 85 -``` -```yaml {% srNumber=23 %} - # threadCount: 5 -``` -```yaml {% srNumber=24 %} - processPiiSensitive: false -``` -```yaml {% srNumber=25 %} - # confidence: 80 -``` -```yaml {% srNumber=26 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=27 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=28 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=29 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=30 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=31 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=32 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=33 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=34 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=35 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=36 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=37 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=33 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=34 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=35 %} -config = """ - -""" - - -``` - -```python {% srNumber=36 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=37 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "snowflake"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/sqlite/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/sqlite/yaml.md index eb29df108962..1049ba658b00 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/sqlite/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/sqlite/yaml.md @@ -106,31 +106,11 @@ This is a sample config for SQLite: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=6 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=7 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -185,533 +165,19 @@ source: ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - - -## Query Usage - -The Query Usage workflow will be using the `query-parser` processor. - -After running a Metadata Ingestion workflow, we can run Query Usage workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for SQLite Usage: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=15 %} - -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json). - -**queryLogDuration**: Configuration to tune how far we want to look back in query logs to process usage data. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**stageFileLocation**: Temporary file name to store the query logs before processing. Absolute file path required. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**resultLimit**: Configuration to set the limit for query logs - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**queryLogFilePath**: Configuration to set the file path for query logs - -{% /codeInfo %} - - -{% codeInfo srNumber=19 %} - -#### Processor, Stage and Bulk Sink Configuration - -To specify where the staging files will be located. - -Note that the location is a directory that will be cleaned at the end of the ingestion. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - -```yaml -source: - type: sqlite-usage - serviceName: - sourceConfig: - config: - type: DatabaseUsage -``` -```yaml {% srNumber=15 %} - # Number of days to look back - queryLogDuration: 7 -``` - -```yaml {% srNumber=16 %} - # This is a directory that will be DELETED after the usage runs - stageFileLocation: -``` - -```yaml {% srNumber=17 %} - # resultLimit: 1000 -``` - -```yaml {% srNumber=18 %} - # If instead of getting the query logs from the database we want to pass a file with the queries - # queryLogFilePath: path-to-file -``` - -```yaml {% srNumber=19 %} -processor: - type: query-parser - config: {} -stage: - type: table-usage - config: - filename: /tmp/sqlite_usage -bulkSink: - type: metadata-usage - config: - filename: /tmp/sqlite_usage -``` - -```yaml {% srNumber=20 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} -{% /codePreview %} - -### 2. Run with the CLI - -There is an extra requirement to run the Usage pipelines. You will need to install: - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata ingest -c -``` - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=21 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=23 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=24 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=25 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=26 %} - -**timeoutSeconds**: Profiler Timeout in Seconds +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -{% /codeInfo %} - -{% codeInfo srNumber=27 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=28 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=29 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=30 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=31 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=32 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: sqlite - serviceName: local_sqlite - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=21 %} - generateSampleData: true -``` -```yaml {% srNumber=22 %} - # profileSample: 85 -``` -```yaml {% srNumber=23 %} - # threadCount: 5 -``` -```yaml {% srNumber=24 %} - processPiiSensitive: false -``` -```yaml {% srNumber=25 %} - # confidence: 80 -``` -```yaml {% srNumber=26 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=27 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=28 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=29 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=30 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=31 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=32 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` - -{% /codeBlock %} - -{% /codePreview %} - -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - - - -### 2. Prepare the Profiler DAG - -Here, we follow a similar approach as with the metadata and usage pipelines, although we will use a different Workflow class: - - - - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=33 %} - -#### Import necessary modules - -The `ProfilerWorkflow` class that is being imported is a part of a metadata orm_profiler framework, which defines a process of extracting Profiler data. - -Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. - -{% /codeInfo %} - -{% codeInfo srNumber=34 %} - -**Default arguments for all tasks in the Airflow DAG.** -- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. - -{% /codeInfo %} - - -{% codeInfo srNumber=35 %} - -- **config**: Specifies config for the profiler as we prepare above. - -{% /codeInfo %} - -{% codeInfo srNumber=36 %} - -- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `ProfilerWorkflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. - -{% /codeInfo %} - -{% codeInfo srNumber=37 %} - -- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements -- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.py" %} - -```python {% srNumber=33 %} -import yaml -from datetime import timedelta -from airflow import DAG -from metadata.workflow.profiler import ProfilerWorkflow -from metadata.workflow.workflow_output_handler import print_status - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - - -``` -```python {% srNumber=34 %} -default_args = { - "owner": "user_name", - "email_on_failure": False, - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - - -``` - -```python {% srNumber=35 %} -config = """ - -""" - - -``` - -```python {% srNumber=36 %} -def metadata_ingestion_workflow(): - workflow_config = yaml.safe_load(config) - workflow = ProfilerWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - - -``` - -```python {% srNumber=37 %} -with DAG( - "profiler_example", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="profile_and_test_using_recipe", - python_callable=metadata_ingestion_workflow, - ) - - -``` - -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "sqlite"} /%} ## Lineage diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/trino/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/trino/yaml.md index e83113d2144d..9aecbe735105 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/trino/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/trino/yaml.md @@ -129,31 +129,11 @@ This is a sample config for Trino: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=10 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=11 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -221,290 +201,19 @@ source: # key: value ``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=10 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=11 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=13 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=15 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=18 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=20 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=21 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=22 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=23 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=24 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: trino - serviceName: - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=13 %} - generateSampleData: true -``` -```yaml {% srNumber=14 %} - # profileSample: 85 -``` -```yaml {% srNumber=15 %} - # threadCount: 5 -``` -```yaml {% srNumber=16 %} - processPiiSensitive: false -``` -```yaml {% srNumber=17 %} - # confidence: 80 -``` -```yaml {% srNumber=18 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=19 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=20 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=21 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=22 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - - -``` - -```yaml {% srNumber=23 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -```yaml {% srNumber=24 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` - -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "trino"} /%} ## SSL Configuration @@ -547,9 +256,6 @@ source: {% /codeBlock %} {% /codePreview %} - - - ## dbt Integration {% tilesContainer %} @@ -561,15 +267,3 @@ source: link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/trino/airflow" - / %} - -{% /tilesContainer %} \ No newline at end of file diff --git a/openmetadata-docs/content/v1.2.x/connectors/database/vertica/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/database/vertica/yaml.md index 457394c6e456..3ac48967732f 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/database/vertica/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/database/vertica/yaml.md @@ -146,31 +146,11 @@ This is a sample config for Vertica: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/database/source-config-def.md" /%} -{% codeInfo srNumber=7 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json): - -**markDeletedTables**: To flag tables as soft-deleted if they are not present anymore in the source system. - -**includeTables**: true or false, to ingest table data. Default is true. - -**includeViews**: true or false, to ingest views definitions. - -**databaseFilterPattern**, **schemaFilterPattern**, **tableFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database) - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=8 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -221,288 +201,19 @@ source: # key: value ``` -```yaml {% srNumber=7 %} - sourceConfig: - config: - type: DatabaseMetadata - markDeletedTables: true - includeTables: true - includeViews: true - # includeTags: true - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 - # tableFilterPattern: - # includes: - # - users - # - type_test - # excludes: - # - table3 - # - table4 -``` - -```yaml {% srNumber=8 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} - -{% /codePreview %} - -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. - -## Data Profiler - -The Data Profiler workflow will be using the `orm-profiler` processor. - -After running a Metadata Ingestion workflow, we can run Data Profiler workflow. -While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server. - - -### 1. Define the YAML Config - -This is a sample config for the profiler: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - -{% codeInfo srNumber=11 %} - -**profileSample**: Percentage of data or no. of rows we want to execute the profiler and tests on. - -{% /codeInfo %} - -{% codeInfo srNumber=12 %} - -**threadCount**: Number of threads to use during metric computations. - -{% /codeInfo %} - -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - -{% codeInfo srNumber=15 %} - -**timeoutSeconds**: Profiler Timeout in Seconds - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**databaseFilterPattern**: Regex to only fetch databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=17 %} - -**schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=18 %} - -**tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern. - -{% /codeInfo %} - -{% codeInfo srNumber=19 %} - -#### Processor Configuration - -Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI: - -**tableConfig**: `tableConfig` allows you to set up some configuration at the table level. -{% /codeInfo %} - - -{% codeInfo srNumber=20 %} - -#### Sink Configuration - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. -{% /codeInfo %} - - -{% codeInfo srNumber=21 %} - -#### Workflow Configuration - -The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. - -For a simple, local installation using our docker containers, this looks like: - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="filename.yaml" %} - - -```yaml -source: - type: vertica - serviceName: local_vertica - sourceConfig: - config: - type: Profiler -``` - -```yaml {% srNumber=10 %} - generateSampleData: true -``` -```yaml {% srNumber=11 %} - # profileSample: 85 -``` -```yaml {% srNumber=12 %} - # threadCount: 5 -``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` -```yaml {% srNumber=15 %} - # timeoutSeconds: 43200 -``` -```yaml {% srNumber=16 %} - # databaseFilterPattern: - # includes: - # - database1 - # - database2 - # excludes: - # - database3 - # - database4 -``` -```yaml {% srNumber=17 %} - # schemaFilterPattern: - # includes: - # - schema1 - # - schema2 - # excludes: - # - schema3 - # - schema4 -``` -```yaml {% srNumber=18 %} - # tableFilterPattern: - # includes: - # - table1 - # - table2 - # excludes: - # - table3 - # - table4 -``` +{% partial file="/v1.2/connectors/yaml/database/source-config.md" /%} -```yaml {% srNumber=19 %} -processor: - type: orm-profiler - config: {} # Remove braces if adding properties - # tableConfig: - # - fullyQualifiedName:
- # profileSample: # default - - # profileSample: # default will be 100 if omitted - # profileQuery: - # columnConfig: - # excludeColumns: - # - - # includeColumns: - # - columnName: - # - metrics: - # - MEAN - # - MEDIAN - # - ... - # partitionConfig: - # enablePartitioning: - # partitionColumnName: - # partitionIntervalType: - # Pick one of the variation shown below - # ----'TIME-UNIT' or 'INGESTION-TIME'------- - # partitionInterval: - # partitionIntervalUnit: - # ------------'INTEGER-RANGE'--------------- - # partitionIntegerRangeStart: - # partitionIntegerRangeEnd: - # -----------'COLUMN-VALUE'---------------- - # partitionValues: - # - - # - +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -``` - -```yaml {% srNumber=20 %} -sink: - type: metadata-rest - config: {} -``` - -```yaml {% srNumber=21 %} -workflowConfig: - # loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR - openMetadataServerConfig: - hostPort: - authProvider: -``` +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -- You can learn more about how to configure and run the Profiler Workflow to extract Profiler data and execute the Data Quality from [here](/connectors/ingestion/workflows/profiler) - -### 2. Run with the CLI - -After saving the YAML config, we will run the command the same way we did for the metadata ingestion: - -```bash -metadata profile -c -``` +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} -Note now instead of running `ingest`, we are using the `profile` command to select the Profiler workflow. +{% partial file="/v1.2/connectors/yaml/data-profiler.md" variables={connector: "vertica"} /%} ## dbt Integration @@ -515,15 +226,3 @@ Note now instead of running `ingest`, we are using the `profile` command to sele link="/connectors/ingestion/workflows/dbt" /%} {% /tilesContainer %} - -## Related - -{% tilesContainer %} - -{% tile - title="Ingest with Airflow" - description="Configure the ingestion using Airflow SDK" - link="/connectors/database/vertica/airflow" - / %} - -{% /tilesContainer %} diff --git a/openmetadata-docs/content/v1.2.x/connectors/messaging/kafka/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/messaging/kafka/yaml.md index 221b92b253b0..6e1af1bdc0c8 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/messaging/kafka/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/messaging/kafka/yaml.md @@ -115,27 +115,11 @@ following [link](https://docs.confluent.io/5.5.1/clients/confluent-kafka-python/ {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/messaging/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): - -**generateSampleData:** Option to turn on/off generating sample data during metadata extraction. - -**topicFilterPattern:** Note that the `topicFilterPattern` supports regex as include or exclude. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -173,36 +157,15 @@ source: ```yaml {% srNumber=8 %} schemaRegistryConfig: {} ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: MessagingMetadata - topicFilterPattern: - excludes: - - _confluent.* - # includes: - # - topic1 - # generateSampleData: true -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/messaging/source-config.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/messaging/kinesis/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/messaging/kinesis/yaml.md index da704ea9629d..ab309c9a3589 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/messaging/kinesis/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/messaging/kinesis/yaml.md @@ -166,27 +166,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/messaging/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): - -**generateSampleData:** Option to turn on/off generating sample data during metadata extraction. - -**topicFilterPattern:** Note that the `topicFilterPattern` supports regex as include or exclude. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -227,37 +211,15 @@ source: ```yaml {% srNumber=8 %} # assumeRoleSourceIdentity: identity ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: MessagingMetadata - topicFilterPattern: - excludes: - - _confluent.* - # includes: - # - topic1 - # generateSampleData: true -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/messaging/source-config.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} + +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/messaging/redpanda/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/messaging/redpanda/yaml.md index c204389762ab..92cd66cb6db8 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/messaging/redpanda/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/messaging/redpanda/yaml.md @@ -115,27 +115,11 @@ following [link](https://docs.confluent.io/5.5.1/clients/confluent-kafka-python/ {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/messaging/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): - -**generateSampleData:** Option to turn on/off generating sample data during metadata extraction. - -**topicFilterPattern:** Note that the `topicFilterPattern` supports regex as include or exclude. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -173,36 +157,15 @@ source: ```yaml {% srNumber=8 %} schemaRegistryConfig: {} ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: MessagingMetadata - topicFilterPattern: - excludes: - - _confluent.* - # includes: - # - topic1 - # generateSampleData: true -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} - -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/messaging/source-config.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/metadata/amundsen/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/metadata/amundsen/yaml.md index 5ed6ee9c686c..8e7894cc589f 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/metadata/amundsen/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/metadata/amundsen/yaml.md @@ -97,7 +97,7 @@ To send the metadata to OpenMetadata, it needs to be specified as `type: metadat {% /codeInfo %} -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} {% codeBlock fileName="filename.yaml" %} @@ -139,19 +139,10 @@ sink: config: {} ``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/metadata/atlas/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/metadata/atlas/yaml.md index c9b9c655a35d..373850d3b77c 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/metadata/atlas/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/metadata/atlas/yaml.md @@ -99,7 +99,7 @@ To send the metadata to OpenMetadata, it needs to be specified as `type: metadat {% /codeInfo %} -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} {% codeBlock fileName="filename.yaml" %} @@ -139,19 +139,10 @@ sink: config: {} ``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} {% /codePreview %} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/ml-model/mlflow/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/ml-model/mlflow/yaml.md index cd1aee3f6e97..86349c40b650 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/ml-model/mlflow/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/ml-model/mlflow/yaml.md @@ -66,25 +66,11 @@ This is a sample config for Mlflow: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/ml-model/source-config-def.md" /%} -{% codeInfo srNumber=3 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): - -**markDeletedMlModels**: Set the Mark Deleted Ml Models toggle to flag ml models as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=4 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -105,31 +91,15 @@ source: ```yaml {% srNumber=2 %} registryUri: mysql+pymysql://mlflow:password@localhost:3307/experiments ``` -```yaml {% srNumber=3 %} - sourceConfig: - config: - type: MlModelMetadata - # markDeletedMlModels: true -``` -```yaml {% srNumber=4 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ml-model/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/ml-model/sagemaker/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/ml-model/sagemaker/yaml.md index 56de01487828..79fc854176d7 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/ml-model/sagemaker/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/ml-model/sagemaker/yaml.md @@ -164,25 +164,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/ml-model/source-config-def.md" /%} -{% codeInfo srNumber=9 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): - -**markDeletedMlModels**: Set the Mark Deleted Ml Models toggle to flag ml models as soft-deleted if they are not present anymore in the source system. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=10 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -223,31 +209,15 @@ source: ```yaml {% srNumber=8 %} # assumeRoleSourceIdentity: identity ``` -```yaml {% srNumber=9 %} - sourceConfig: - config: - type: MlModelMetadata - # markDeletedMlModels: true -``` -```yaml {% srNumber=10 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/ml-model/source-config.md" /%} -{% /codeBlock %} - -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airbyte/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airbyte/yaml.md index 37d0d37b125c..7560caade272 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airbyte/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airbyte/yaml.md @@ -61,32 +61,12 @@ This is a sample config for Airbyte: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=2 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=3 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} {% /codeInfoContainer %} @@ -104,40 +84,15 @@ source: ```yaml {% srNumber=1 %} hostPort: http://localhost:8000 ``` -```yaml {% srNumber=2 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=3 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/yaml.md index 2363614c7333..8b6cfb193ccf 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/yaml.md @@ -103,34 +103,11 @@ In terms of `connection` we support the following selections: {% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -#### Source Configuration - Source Config - -{% codeInfo srNumber=5 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -179,41 +156,16 @@ source: # hostPort: localhost:3306 # databaseMode: ":memory:" (optional) ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} - -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/dagster/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/dagster/yaml.md index 0642946dce02..ba78245f2e84 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/dagster/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/dagster/yaml.md @@ -68,32 +68,11 @@ This is a sample config for Dagster: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=3 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=4 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -114,40 +93,15 @@ source: ```yaml {% srNumber=2 %} token: token ``` -```yaml {% srNumber=3 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=4 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/databricks-pipeline/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/databricks-pipeline/yaml.md index 1afdf844fd1f..7cfe880ef7fb 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/databricks-pipeline/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/databricks-pipeline/yaml.md @@ -76,32 +76,11 @@ This is a sample config for Databricks Pipeline: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=4 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the Include tags toggle to control whether or not to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=5 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -130,40 +109,15 @@ source: connectionArguments: http_path: ``` -```yaml {% srNumber=4 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=5 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/domo-pipeline/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/domo-pipeline/yaml.md index ff043bb19f93..0e4fad5a6443 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/domo-pipeline/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/domo-pipeline/yaml.md @@ -91,32 +91,11 @@ This is a sample config for Domo-Pipeline: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=6 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=7 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -145,40 +124,15 @@ source: ```yaml {% srNumber=5 %} instanceDomain: https://.domo.com ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/fivetran/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/fivetran/yaml.md index 83f89c63402a..b364ec328bbb 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/fivetran/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/fivetran/yaml.md @@ -95,32 +95,11 @@ This refers to the maximum number of records that can be returned in a single pa {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=5 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=6 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -146,40 +125,15 @@ source: ```yaml {% srNumber=4 %} # limit: 1000 (default) ``` -```yaml {% srNumber=5 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=6 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/glue-pipeline/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/glue-pipeline/yaml.md index 05dde1d10860..82372391a2fd 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/glue-pipeline/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/glue-pipeline/yaml.md @@ -96,32 +96,11 @@ This is a sample config for Glue: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=6 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=7 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -151,40 +130,15 @@ source: ```yaml {% srNumber=5 %} # endPointURL: https://glue.us-east-2.amazonaws.com/ ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/nifi/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/nifi/yaml.md index 5435a26bd9d6..fe2cb874f79d 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/nifi/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/nifi/yaml.md @@ -73,32 +73,11 @@ This is a sample config for Nifi: {% /codeInfo %} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=2 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - -{% /codeInfo %} - - -#### Sink Configuration - -{% codeInfo srNumber=3 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -125,40 +104,16 @@ source: ```yaml {% srNumber=1 %} hostPort: http://localhost:8000 ``` -```yaml {% srNumber=2 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=3 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/spline/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/spline/yaml.md index c43aa5b2bde9..ec81579a5c61 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/spline/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/spline/yaml.md @@ -71,36 +71,14 @@ This is a sample config for Spline: **uiHostPort**: Spline UI Host & Port is an optional field which is used for generating redirection URL from OpenMetadata to Spline Portal. This should be specified as a URI string in the format `scheme://hostname:port`. E.g., `http://localhost:9090`, `http://host.docker.internal:9090`. - -{% /codeInfo %} - - -#### Source Configuration - Source Config - -{% codeInfo srNumber=2 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): - -**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. - -**includeTags**: Set the 'Include Tags' toggle to control whether to include tags as part of metadata ingestion. - -**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. - -**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. - {% /codeInfo %} -#### Sink Configuration +{% partial file="/v1.2/connectors/yaml/pipeline/source-config-def.md" /%} -{% codeInfo srNumber=3 %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -119,42 +97,16 @@ source: hostPort: http://localhost:8080 uiHostPort: http://localhost:9090 ``` -```yaml {% srNumber=2 %} - sourceConfig: - config: - type: PipelineMetadata - # markDeletedPipelines: True - # includeTags: True - # includeLineage: true - # dbServiceNames: - # - local_hive - # pipelineFilterPattern: - # includes: - # - pipeline1 - # - pipeline2 - # excludes: - # - pipeline3 - # - pipeline4 -``` -```yaml {% srNumber=3 %} -sink: - type: metadata-rest - config: {} -``` -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/pipeline/source-config.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -### 2. Run with the CLI -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/search/elasticsearch/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/search/elasticsearch/yaml.md index 9dfd3e17da84..31befd6c6995 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/search/elasticsearch/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/search/elasticsearch/yaml.md @@ -86,33 +86,11 @@ This is a sample config for ElasticSearch: **connectionTimeoutSecs**: Connection timeout configuration for communicating with ElasticSearch APIs. {% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/search/source-config-def.md" /%} +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -#### Source Configuration - Source Config - -{% codeInfo srNumber=6 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/searchServiceMetadataPipeline.json): - -**includeSampleData**: Set the Ingest Sample Data toggle to control whether to ingest sample data as part of metadata ingestion. - -**sampleSize**: If include sample data is enabled, 10 records will be ingested by default. Using this field you can customize the size of sample data. - -**markDeletedSearchIndexes**: Optional configuration to soft delete `search indexes` in OpenMetadata if the source `search indexes` are deleted. After deleting, all the associated entities like lineage, etc., with that `search index` will be deleted. - -**searchIndexFilterPattern**: Note that the `searchIndexFilterPattern` support regex to include or exclude search indexes during metadata ingestion process. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=7%} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} {% /codeInfoContainer %} @@ -144,40 +122,15 @@ source: ```yaml {% srNumber=5 %} connectionTimeoutSecs: 30 ``` -```yaml {% srNumber=6 %} - sourceConfig: - config: - type: SearchMetadata - # markDeletedSearchIndexes: True - # includeSampleData: True - # sampleSize: 10 - # searchIndexFilterPattern: - # includes: - # - index1 - # - index2 - # excludes: - # - index4 - # - index3 -``` -```yaml {% srNumber=7 %} -sink: - type: metadata-rest - config: {} -``` - -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} -{% /codeBlock %} +{% partial file="/v1.2/connectors/yaml/search/source-config.md" /%} -{% /codePreview %} +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -### 2. Run with the CLI +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +{% /codeBlock %} -```bash -metadata ingest -c -``` +{% /codePreview %} -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/connectors/storage/s3/yaml.md b/openmetadata-docs/content/v1.2.x/connectors/storage/s3/yaml.md index 4293d5953f5b..9cced3f60ce0 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/storage/s3/yaml.md +++ b/openmetadata-docs/content/v1.2.x/connectors/storage/s3/yaml.md @@ -213,32 +213,11 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la {% /codeInfo %} +{% partial file="/v1.2/connectors/yaml/storage/source-config-def.md" /%} -#### Source Configuration - Source Config +{% partial file="/v1.2/connectors/yaml/ingestion-sink-def.md" /%} -{% codeInfo srNumber=13 %} - -The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storageServiceMetadataPipeline.json): - -**containerFilterPattern**: Note that the filter supports regex as include or exclude. You can find examples [here](/connectors/ingestion/workflows/metadata/filter-patterns/database). - -{% /codeInfo %} - -{% codeInfo srNumber=16 %} - -**storageMetadataConfigSource**: Path to the `openmetadata_storage_manifest.json` global manifest file. It can be located in S3, a local path or as a URL to the file. - -{% /codeInfo %} - -#### Sink Configuration - -{% codeInfo srNumber=14 %} - -To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. - -{% /codeInfo %} - -{% partial file="/v1.2/connectors/workflow-config.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config-def.md" /%} #### Advanced Configuration @@ -301,41 +280,11 @@ source: # key: value ``` -```yaml {% srNumber=13 %} - sourceConfig: - config: - type: StorageMetadata - # containerFilterPattern: - # includes: - # - container1 - # - container2 - # excludes: - # - container3 - # - container4 -``` -```yaml {% srNumber=16 %} - # storageMetadataConfigSource: - ## For S3 - # securityConfig: - # awsAccessKeyId: ... - # awsSecretAccessKey: ... - # awsRegion: ... - # prefixConfig: - # containerName: om-glue-test - # objectPrefix: - ## For HTTP - # manifestHttpPath: http://... - ## For Local - # manifestFilePath: /path/to/openmetadata_storage_manifest.json -``` +{% partial file="/v1.2/connectors/yaml/storage/source-config.md" /%} -```yaml {% srNumber=14 %} -sink: - type: metadata-rest - config: {} -``` +{% partial file="/v1.2/connectors/yaml/ingestion-sink.md" /%} -{% partial file="/v1.2/connectors/workflow-config-yaml.md" /%} +{% partial file="/v1.2/connectors/yaml/workflow-config.md" /%} {% /codeBlock %} @@ -343,16 +292,7 @@ sink: -### 2. Run with the CLI - -First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: - -```bash -metadata ingest -c -``` - -Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, -you will be able to extract metadata from different sources. +{% partial file="/v1.2/connectors/yaml/ingestion-cli.md" /%} ## Related