diff --git a/openmetadata-docs/content/partials/v1.2/deployment/run-connectors-class.md b/openmetadata-docs/content/partials/v1.2/deployment/run-connectors-class.md index ca91cb7f9b00..2182296b5769 100644 --- a/openmetadata-docs/content/partials/v1.2/deployment/run-connectors-class.md +++ b/openmetadata-docs/content/partials/v1.2/deployment/run-connectors-class.md @@ -8,22 +8,24 @@ For example, for the `Metadata` workflow we'll use: ```python import yaml -from metadata.ingestion.api.workflow import Workflow +from metadata.workflow.metadata import MetadataWorkflow def run(): workflow_config = yaml.safe_load(CONFIG) - workflow = Workflow.create(workflow_config) + workflow = MetadataWorkflow.create(workflow_config) workflow.execute() workflow.raise_from_status() - workflow.print_status() + print_status(workflow) workflow.stop() ``` The classes for each workflow type are: -- `Metadata`: `from metadata.ingestion.api.workflow import Workflow` -- `Lineage`: `from metadata.ingestion.api.workflow import Workflow` -- `Usage`: `from metadata.ingestion.api.workflow import Workflow` -- `dbt`: `from metadata.ingestion.api.workflow import Workflow` -- `Profiler`: `from metadata.profiler.api.workflow import ProfilerWorkflow` -- `Data Quality`: `from metadata.data_quality.api.workflow import TestSuiteWorkflow` +- `Metadata`: `from metadata.workflow.metadata import MetadataWorkflow` +- `Lineage`: `from metadata.workflow.metadata import MetadataWorkflow` (same as metadata) +- `Usage`: `from metadata.workflow.usage import UsageWorkflow` +- `dbt`: `from metadata.workflow.metadata import MetadataWorkflow` +- `Profiler`: `from metadata.workflow.profiler import ProfilerWorkflow` +- `Data Quality`: `from metadata.workflow.data_quality import TestSuiteWorkflow` +- `Data Insights`: `from metadata.workflow.data_insight import DataInsightWorkflow` +- `Elasticsearch Reindex`: `from metadata.workflow.metadata import MetadataWorkflow` (same as metadata) diff --git a/openmetadata-docs/content/partials/v1.2/deployment/upgrade/upgrade-prerequisites.md b/openmetadata-docs/content/partials/v1.2/deployment/upgrade/upgrade-prerequisites.md index 3532aeeace29..8b1189089775 100644 --- a/openmetadata-docs/content/partials/v1.2/deployment/upgrade/upgrade-prerequisites.md +++ b/openmetadata-docs/content/partials/v1.2/deployment/upgrade/upgrade-prerequisites.md @@ -112,7 +112,7 @@ After the migration is finished, you can revert this changes. ### Version Upgrades - The OpenMetadata Server is now based on **JDK 17** -- OpenMetadata now supports **Elasticsearch** up to version **8.10.2** and **Opensearch** up to version **2.7** +- OpenMetadata now **requires** **Elasticsearch** version **8.10.2** and **Opensearch** version **2.7** There is no direct migration to bump the indexes to the new supported versions. You might see errors like: diff --git a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/mwaa.md b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/mwaa.md index 34df1029b90f..6f8bbbe70a26 100644 --- a/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/mwaa.md +++ b/openmetadata-docs/content/v1.2.x/connectors/pipeline/airflow/mwaa.md @@ -5,92 +5,16 @@ slug: /connectors/pipeline/airflow/mwaa # Extract MWAA Metadata -When running ingestion workflows from MWAA we have three approaches: +To extract MWAA Metadata we need to run the ingestion **from** MWAA, since the underlying database lives in a private network. + +To learn how to run connectors from MWAA, you can take a look at this [doc](/deployment/ingestion/mwaa). In this guide, +we'll explain how to configure the MWAA ingestion in the 3 supported approaches: 1. Install the openmetadata-ingestion package as a requirement in the Airflow environment. We will then run the process using a `PythonOperator` -2. Configure an ECS cluster and run the ingestion as an `ECSOperator`. +2. Configure an ECS cluster and run the ingestion as an ECS Operator. 3. Install a plugin and run the ingestion with the `PythonVirtualenvOperator`. -We will now discuss pros and cons of each aspect and how to configure them. - -## Ingestion Workflows as a Python Operator - -### PROs - -- It is the simplest approach -- We don’t need to spin up any further infrastructure - -### CONs - -- We need to install the [openmetadata-ingestion](https://pypi.org/project/openmetadata-ingestion/) package in the MWAA environment -- The installation can clash with existing libraries -- Upgrading the OM version will require to repeat the installation process - -To install the package, we need to update the `requirements.txt` file from the MWAA environment to add the following line: - -``` -openmetadata-ingestion[]==x.y.z -``` - -Where `x.y.z` is the version of the OpenMetadata ingestion package. Note that the version needs to match the server version. If we are using the server at 1.1.0, then the ingestion package needs to also be 1.1.0. - -The plugin parameter is a list of the sources that we want to ingest. An example would look like this `openmetadata-ingestion[mysql,snowflake,s3]==1.1.0`. - -A DAG deployed using a Python Operator would then look like follows - -```python -import json -from datetime import timedelta - -from airflow import DAG - -try: - from airflow.operators.python import PythonOperator -except ModuleNotFoundError: - from airflow.operators.python_operator import PythonOperator - -from airflow.utils.dates import days_ago - -from metadata.workflow.metadata import MetadataWorkflow - -from metadata.workflow.workflow_output_handler import print_status - -default_args = { - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - -config = """ -YAML config -""" - -def metadata_ingestion_workflow(): - workflow_config = json.loads(config) - workflow = MetadataWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - -with DAG( - "redshift_ingestion", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonOperator( - task_id="ingest_redshift", - python_callable=metadata_ingestion_workflow, - ) -``` - -Where you can update the YAML configuration and workflow classes accordingly. accordingly. Further examples on how to -run the ingestion can be found on the documentation (e.g., [Snowflake](https://docs.open-metadata.org/connectors/database/snowflake)). - -### Extracting MWAA Metadata +## 1. Extracting MWAA Metadata with the PythonOperator As the ingestion process will be happening locally in MWAA, we can prepare a DAG with the following YAML configuration: @@ -119,152 +43,9 @@ workflowConfig: authProvider: ``` -Using the connection type `Backend` will pick up the Airflow database session directly at runtime. - -## Ingestion Workflows as an ECS Operator - -### PROs -- Completely isolated environment -- Easy to update each version - -### CONs -- We need to set up an ECS cluster and the required policies in MWAA to connect to ECS and handle Log Groups. - -We will now describe the steps, following the official AWS documentation. - -### 1. Create an ECS Cluster - -- The cluster just needs a task to run in `FARGATE` mode. -- The required image is `docker.getcollate.io/openmetadata/ingestion-base:x.y.z` - - The same logic as above applies. The `x.y.z` version needs to match the server version. For example, `docker.getcollate.io/openmetadata/ingestion-base:0.13.2` - -We have tested this process with a Task Memory of 512MB and Task CPU (unit) of 256. This can be tuned depending on the amount of metadata that needs to be ingested. - -When creating the ECS Cluster, take notes on the log groups assigned, as we will need them to prepare the MWAA Executor Role policies. - -### 2. Update MWAA Executor Role policies - -- Identify your MWAA executor role. This can be obtained from the details view of your MWAA environment. -- Add the following two policies to the role, the first with ECS permissions: - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "VisualEditor0", - "Effect": "Allow", - "Action": [ - "ecs:RunTask", - "ecs:DescribeTasks" - ], - "Resource": "*" - }, - { - "Action": "iam:PassRole", - "Effect": "Allow", - "Resource": [ - "*" - ], - "Condition": { - "StringLike": { - "iam:PassedToService": "ecs-tasks.amazonaws.com" - } - } - } - ] -} -``` - - -And for the Log Group permissions - -```json -{ - "Effect": "Allow", - "Action": [ - "logs:CreateLogStream", - "logs:CreateLogGroup", - "logs:PutLogEvents", - "logs:GetLogEvents", - "logs:GetLogRecord", - "logs:GetLogGroupFields", - "logs:GetQueryResults" - ], - "Resource": [ - "arn:aws:logs:::log-group:*", - "arn:aws:logs:*:*:log-group::*" - ] -} - -``` - -Note how you need to replace the `region`, `account-id` and the `log group` names for your Airflow Environment and ECS. - -A DAG created using the ECS Operator will then look like this: - -```python -from datetime import datetime - -import yaml -from airflow import DAG - -from http import client -from airflow import DAG -from airflow.providers.amazon.aws.operators.ecs import ECSOperator -from airflow.utils.dates import days_ago -import boto3 - - -CLUSTER_NAME="openmetadata-ingestion-cluster" # Replace value for CLUSTER_NAME with your information. -CONTAINER_NAME="openmetadata-ingestion" # Replace value for CONTAINER_NAME with your information. -LAUNCH_TYPE="FARGATE" +## 2. Extracting MWAA Metadata with the ECS Operator -config = """ -YAML config -""" - - -with DAG( - dag_id="ecs_fargate_dag", - schedule_interval=None, - catchup=False, - start_date=days_ago(1), - is_paused_upon_creation=True, -) as dag: - client=boto3.client('ecs') - services=client.list_services(cluster=CLUSTER_NAME,launchType=LAUNCH_TYPE) - service=client.describe_services(cluster=CLUSTER_NAME,services=services['serviceArns']) - ecs_operator_task = ECSOperator( - task_id = "ecs_ingestion_task", - dag=dag, - cluster=CLUSTER_NAME, - task_definition=service['services'][0]['taskDefinition'], - launch_type=LAUNCH_TYPE, - overrides={ - "containerOverrides":[ - { - "name":CONTAINER_NAME, - "command":["python", "main.py"], - "environment": [ - { - "name": "config", - "value": config - }, - { - "name": "pipelineType", - "value": "metadata" - }, - ], - }, - ], - }, - - network_configuration=service['services'][0]['networkConfiguration'], - awslogs_group="/ecs/ingest", - awslogs_stream_prefix=f"ecs/{CONTAINER_NAME}", - ) -``` +After setting up the ECS Cluster, you'll need first to check the MWAA database connection ### Getting MWAA database connection @@ -336,102 +117,7 @@ Note that trying to log the `conf.get("core", "sql_alchemy_conn", fallback=None) #### Preparing the metadata extraction -We will use ECS here as well to get the metadata out of MWAA. The only important detail is to ensure that we are -using the right networking. - -When creating the ECS cluster, add the VPC, subnets and security groups used when setting up MWAA. We need to -be in the same network environment as MWAA to reach the underlying database. In the task information, we'll configure the -same task we did for the previous metadata extraction. - -Now, we are ready to prepare the DAG to extract MWAA metadata: - -```python -from datetime import datetime - -import yaml -from airflow import DAG - -from http import client -from airflow import DAG -from airflow.providers.amazon.aws.operators.ecs import ECSOperator -from airflow.utils.dates import days_ago -import boto3 - - -CLUSTER_NAME="openmetadata-ingestion-vpc" #Replace value for CLUSTER_NAME with your information. -CONTAINER_NAME="openmetadata-ingestion" #Replace value for CONTAINER_NAME with your information. -LAUNCH_TYPE="FARGATE" - -config = """ -source: - type: airflow - serviceName: airflow_mwaa - serviceConnection: - config: - type: Airflow - hostPort: http://localhost:8080 - numberOfStatus: 10 - connection: - type: Postgres # Typically MWAA uses Postgres - username: - password: - hostPort: : - database: - sourceConfig: - config: - type: PipelineMetadata -sink: - type: metadata-rest - config: {} -workflowConfig: - openMetadataServerConfig: - hostPort: - authProvider: -""" - - -with DAG( - dag_id="ecs_fargate_dag_vpc", - schedule_interval=None, - catchup=False, - start_date=days_ago(1), - is_paused_upon_creation=True, -) as dag: - client=boto3.client('ecs') - services=client.list_services(cluster=CLUSTER_NAME,launchType=LAUNCH_TYPE) - service=client.describe_services(cluster=CLUSTER_NAME,services=services['serviceArns']) - ecs_operator_task = ECSOperator( - task_id = "ecs_ingestion_task", - dag=dag, - cluster=CLUSTER_NAME, - task_definition=service['services'][0]['taskDefinition'], - launch_type=LAUNCH_TYPE, - overrides={ - "containerOverrides":[ - { - "name":CONTAINER_NAME, - "command":["python", "main.py"], - "environment": [ - { - "name": "config", - "value": config - }, - { - "name": "pipelineType", - "value": "metadata" - }, - ], - }, - ], - }, - - network_configuration=service['services'][0]['networkConfiguration'], - awslogs_group="/ecs/ingest", - awslogs_stream_prefix=f"ecs/{CONTAINER_NAME}", - ) -``` - -A YAML connection example could be: +Then, prepare the YAML config with the information you retrieved above. For example: ```yaml source: @@ -463,137 +149,9 @@ workflowConfig: jwtToken: ... ``` -## Ingestion Workflows as a Python Virtualenv Operator - -### PROs - -- Installation does not clash with existing libraries -- Simpler than ECS - -### CONs - -- We need to install an additional plugin in MWAA -- DAGs take longer to run due to needing to set up the virtualenv from scratch for each run. - -We need to update the `requirements.txt` file from the MWAA environment to add the following line: - -``` -virtualenv -``` - -Then, we need to set up a custom plugin in MWAA. Create a file named virtual_python_plugin.py. Note that you may need to update the python version (eg, python3.7 -> python3.10) depending on what your MWAA environment is running. -```python -""" -Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. - -Permission is hereby granted, free of charge, to any person obtaining a copy of -this software and associated documentation files (the "Software"), to deal in -the Software without restriction, including without limitation the rights to -use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of -the Software, and to permit persons to whom the Software is furnished to do so. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS -FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR -COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER -IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN -CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -""" -from airflow.plugins_manager import AirflowPlugin -import airflow.utils.python_virtualenv -from typing import List -import os - - -def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, system_site_packages: bool) -> List[str]: - cmd = ['python3', '/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', tmp_dir] - if system_site_packages: - cmd.append('--system-site-packages') - if python_bin is not None: - cmd.append(f'--python={python_bin}') - return cmd - - -airflow.utils.python_virtualenv._generate_virtualenv_cmd = _generate_virtualenv_cmd - -os.environ["PATH"] = f"/usr/local/airflow/.local/bin:{os.environ['PATH']}" - - -class VirtualPythonPlugin(AirflowPlugin): - name = 'virtual_python_plugin' -``` - -This is modified from the [AWS sample](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html). - -Next, create the plugins.zip file and upload it according to [AWS docs](https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-dag-import-plugins.html). You will also need to [disable lazy plugin loading in MWAA](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html#samples-virtualenv-airflow-config). - -A DAG deployed using the PythonVirtualenvOperator would then look like: - -```python -from datetime import timedelta - -from airflow import DAG - -from airflow.operators.python import PythonVirtualenvOperator - -from airflow.utils.dates import days_ago - - -default_args = { - "retries": 3, - "retry_delay": timedelta(seconds=10), - "execution_timeout": timedelta(minutes=60), -} - -def metadata_ingestion_workflow(): - from metadata.workflow.metadata import MetadataWorkflow - from metadata.workflow.workflow_output_handler import print_status - - import yaml - - config = """ - YAML config - """ - - workflow_config = yaml.loads(config) - workflow = MetadataWorkflow.create(workflow_config) - workflow.execute() - workflow.raise_from_status() - print_status(workflow) - workflow.stop() - -with DAG( - "redshift_ingestion", - default_args=default_args, - description="An example DAG which runs a OpenMetadata ingestion workflow", - start_date=days_ago(1), - is_paused_upon_creation=False, - catchup=False, -) as dag: - ingest_task = PythonVirtualenvOperator( - task_id="ingest_redshift", - python_callable=metadata_ingestion_workflow, - requirements=['openmetadata-ingestion==1.0.5.0', - 'apache-airflow==2.4.3', # note, v2.4.3 is the first version that does not conflict with OpenMetadata's 'tabulate' requirements - 'apache-airflow-providers-amazon==6.0.0', # Amazon Airflow provider is necessary for MWAA - 'watchtower',], - system_site_packages=False, - dag=dag, - ) -``` - -Where you can update the YAML configuration and workflow classes accordingly. accordingly. Further examples on how to -run the ingestion can be found on the documentation (e.g., [Snowflake](https://docs.open-metadata.org/connectors/database/snowflake)). - -You will also need to determine the OpenMetadata ingestion extras and Airflow providers you need. Note that the Openmetadata version needs to match the server version. If we are using the server at 0.12.2, then the ingestion package needs to also be 0.12.2. An example of the extras would look like this `openmetadata-ingestion[mysql,snowflake,s3]==0.12.2.2`. -For Airflow providers, you will want to pull the provider versions from [the matching constraints file](https://raw.githubusercontent.com/apache/airflow/constraints-2.4.3/constraints-3.7.txt). Since this example installs Airflow Providers v2.4.3 on Python 3.7, we use that constraints file. +## 2. Extracting MWAA Metadata with the Python Virtualenv Operator -Also note that the ingestion workflow function must be entirely self contained as it will run by itself in the virtualenv. Any imports it needs, including the configuration, must exist within the function itself. - -### Extracting MWAA Metadata - -As the ingestion process will be happening locally in MWAA, we can prepare a DAG with the following YAML -configuration: +This will be similar as the first step, where you just need the simple `Backend` connection YAML: ```yaml source: @@ -618,5 +176,3 @@ workflowConfig: hostPort: authProvider: ``` - -Using the connection type `Backend` will pick up the Airflow database session directly at runtime. diff --git a/openmetadata-docs/content/v1.2.x/deployment/ingestion/mwaa.md b/openmetadata-docs/content/v1.2.x/deployment/ingestion/mwaa.md index d379c8d9e9e9..f965101dcba7 100644 --- a/openmetadata-docs/content/v1.2.x/deployment/ingestion/mwaa.md +++ b/openmetadata-docs/content/v1.2.x/deployment/ingestion/mwaa.md @@ -32,9 +32,9 @@ To install the package, we need to update the `requirements.txt` file from the M openmetadata-ingestion[]==x.y.z ``` -Where `x.y.z` is the version of the OpenMetadata ingestion package. Note that the version needs to match the server version. If we are using the server at 1.1.0, then the ingestion package needs to also be 1.1.0. +Where `x.y.z` is the version of the OpenMetadata ingestion package. Note that the version needs to match the server version. If we are using the server at 1.2.0, then the ingestion package needs to also be 1.2.0. -The plugin parameter is a list of the sources that we want to ingest. An example would look like this `openmetadata-ingestion[mysql,snowflake,s3]==1.1.0`. +The plugin parameter is a list of the sources that we want to ingest. An example would look like this `openmetadata-ingestion[mysql,snowflake,s3]==1.2.0`. A DAG deployed using a Python Operator would then look like follows @@ -102,17 +102,47 @@ run the ingestion can be found on the documentation (e.g., [Snowflake](https://d We will now describe the steps, following the official AWS documentation. -### 1. Create an ECS Cluster +### 1. Create an ECS Cluster & Task Definition -- The cluster just needs a task to run in `FARGATE` mode. +- The cluster needs a task to run in `FARGATE` mode. - The required image is `docker.getcollate.io/openmetadata/ingestion-base:x.y.z` - - The same logic as above applies. The `x.y.z` version needs to match the server version. For example, `docker.getcollate.io/openmetadata/ingestion-base:0.13.2` + - The same logic as above applies. The `x.y.z` version needs to match the server version. For example, `docker.getcollate.io/openmetadata/ingestion-base:1.2.0` We have tested this process with a Task Memory of 512MB and Task CPU (unit) of 256. This can be tuned depending on the amount of metadata that needs to be ingested. -When creating the ECS Cluster, take notes on the log groups assigned, as we will need them to prepare the MWAA Executor Role policies. +When creating the Task Definition, take notes on the **log groups** assigned, as we will need them to prepare the MWAA Executor Role policies. -### 2. Update MWAA Executor Role policies +For example, if in the JSON from the Task Definition we see: + +```json +"logConfiguration": { + "logDriver": "awslogs", + "options": { + "awslogs-create-group": "true", + "awslogs-group": "/ecs/openmetadata", + "awslogs-region": "us-east-2", + "awslogs-stream-prefix": "ecs" + }, + "secretOptions": [] +} +``` + +We'll need to use the `/ecs/openmetadata` below when configuring the policies. + +### 2. Create a Service + +In the created cluster, we'll create a service to run the task. We will use: +- Compute option: `Launch Type` with `FARGATE` +- Application of type `Service`, where we'll select the Task Definition created above. + +{% note %} + +If you want to extract MWAA metadata, add the **VPC**, **subnets** and **security groups** used when setting up MWAA. We need to +be in the same network environment as MWAA to reach the underlying database. + +{% /note %} + +### 3. Update MWAA Executor Role policies - Identify your MWAA executor role. This can be obtained from the details view of your MWAA environment. - Add the following two policies to the role, the first with ECS permissions: @@ -126,7 +156,9 @@ When creating the ECS Cluster, take notes on the log groups assigned, as we will "Effect": "Allow", "Action": [ "ecs:RunTask", - "ecs:DescribeTasks" + "ecs:DescribeTasks", + "ecs:ListServices", + "ecs:DescribeServices" ], "Resource": "*" }, @@ -162,7 +194,7 @@ And for the Log Group permissions "logs:GetQueryResults" ], "Resource": [ - "arn:aws:logs:::log-group:*", + "arn:aws:logs:::log-group:*", "arn:aws:logs:*:*:log-group::*" ] } @@ -171,6 +203,8 @@ And for the Log Group permissions Note how you need to replace the `region`, `account-id` and the `log group` names for your Airflow Environment and ECS. +Moreover, we will need the name we gave to the ECS Cluster, as well as the container name of the Service. + A DAG created using the ECS Operator will then look like this: ```python @@ -181,7 +215,10 @@ from airflow import DAG from http import client from airflow import DAG -from airflow.providers.amazon.aws.operators.ecs import ECSOperator +# If using Airflow < 2.5 +# from airflow.providers.amazon.aws.operators.ecs import ECSOperator +# If using Airflow > 2.5 +from airflow.providers.amazon.aws.operators.ecs import EcsRunTaskOperator from airflow.utils.dates import days_ago import boto3 @@ -205,7 +242,7 @@ with DAG( client=boto3.client('ecs') services=client.list_services(cluster=CLUSTER_NAME,launchType=LAUNCH_TYPE) service=client.describe_services(cluster=CLUSTER_NAME,services=services['serviceArns']) - ecs_operator_task = ECSOperator( + ecs_operator_task = EcsRunTaskOperator( task_id = "ecs_ingestion_task", dag=dag, cluster=CLUSTER_NAME, @@ -247,6 +284,12 @@ the official OpenMetadata docs, and the value of the `pipelineType` configuratio Which are based on the `PipelineType` [JSON Schema definitions](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json#L14) +Moreover, one of the imports will depend on the MWAA Airflow version you are using: +- If using Airflow < 2.5: `from airflow.providers.amazon.aws.operators.ecs import ECSOperator` +- If using Airflow > 2.5: `from airflow.providers.amazon.aws.operators.ecs import EcsRunTaskOperator` + +Make sure to update the `ecs_operator_task` task call accordingly. + ## Ingestion Workflows as a Python Virtualenv Operator ### PROs @@ -371,6 +414,6 @@ run the ingestion can be found on the documentation (e.g., [Snowflake](https://d You will also need to determine the OpenMetadata ingestion extras and Airflow providers you need. Note that the Openmetadata version needs to match the server version. If we are using the server at 0.12.2, then the ingestion package needs to also be 0.12.2. An example of the extras would look like this `openmetadata-ingestion[mysql,snowflake,s3]==0.12.2.2`. For Airflow providers, you will want to pull the provider versions from [the matching constraints file](https://raw.githubusercontent.com/apache/airflow/constraints-2.4.3/constraints-3.7.txt). Since this example installs Airflow Providers v2.4.3 on Python 3.7, we use that constraints file. -Also note that the ingestion workflow function must be entirely self contained as it will run by itself in the virtualenv. Any imports it needs, including the configuration, must exist within the function itself. +Also note that the ingestion workflow function must be entirely self-contained as it will run by itself in the virtualenv. Any imports it needs, including the configuration, must exist within the function itself. {% partial file="/v1.2/deployment/run-connectors-class.md" /%} diff --git a/openmetadata-docs/content/v1.2.x/quick-start/local-docker-deployment.md b/openmetadata-docs/content/v1.2.x/quick-start/local-docker-deployment.md index 215f0c06f706..80f0489ef174 100644 --- a/openmetadata-docs/content/v1.2.x/quick-start/local-docker-deployment.md +++ b/openmetadata-docs/content/v1.2.x/quick-start/local-docker-deployment.md @@ -257,4 +257,20 @@ installation. 1. Visit the [Features](/releases/features) overview page and explore the OpenMetadata UI. 2. Visit the [Connectors](/connectors) documentation to see what services you can integrate with OpenMetadata. -3. Visit the [API](/swagger.html) documentation and explore the rich set of OpenMetadata APIs. \ No newline at end of file +3. Visit the [API](/swagger.html) documentation and explore the rich set of OpenMetadata APIs. + + +### Volume Permissions: Operation not permitted + +If you are running on Windows (WSL2) and see permissions errors when starting the databases (either MySQL or Postgres), e.g., + +``` +openmetadata_postgresql | chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted +``` + +You can try to update the `/etc/wsl.conf` file from the WSL2 machine to add: + +``` +[automount] +options = "metadata,case=force" +```