Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cosmos-dbt column level metadata #1427

Open
bhawesh490 opened this issue Dec 25, 2024 · 1 comment
Open

cosmos-dbt column level metadata #1427

bhawesh490 opened this issue Dec 25, 2024 · 1 comment
Labels
area:lineage Related to open lineage or DBT dataset lineage dbt:docs Primarily related to dbt docs command or functionality profile:snowflake Related to Snowflake ProfileConfig

Comments

@bhawesh490
Copy link

Hi Team
I am using cosmos dbt for my client in a project.
The dag looks like this
`import os
from datetime import datetime

from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import SnowflakeUserPasswordProfileMapping

profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=SnowflakeUserPasswordProfileMapping(
conn_id="snowflake_conn",
profile_args={"database": "dbt_db", "schema": "dbt_schema"},
)
)

dbt_snowflake_dag = DbtDag(
project_config=ProjectConfig(f"{os.environ['AIRFLOW_HOME']}/dags/dbt/dbt_pipeline",),
operator_args={"install_deps": True},
profile_config=profile_config,
execution_config=ExecutionConfig(dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",),
schedule_interval="@daily",
start_date=datetime(2023, 9, 10),
catchup=False,
dag_id="dbt_dag",
)
`
I have set open lineage using apache-airflow-providers-openlineage

I am sending lineage events to local marquez api and the lineage looks like
Image

You will notice that schema (cloumns are not shown in the lineage graph)

But the issue is that i need column informations as well as shown below .Every dataset needs to have column schema information
Image

What changes should i make in the DbtDag so that it can pass columns/schema informations to airflow and subsequently to the marquez
.
I am aware that column mapping is developed in dbt when we generate docs using dbt-ol docs generate
Additional table and column level metadata will be available if catalog.json, a result of running dbt docs generate, will be found in the target directory.How can i incorporate in cosmos to get this metadata?

@bhawesh490 bhawesh490 changed the title cosmos-dbt column level lineage cosmos-dbt column level metadata Dec 25, 2024
Copy link

dosubot bot commented Dec 25, 2024

To incorporate column-level metadata from catalog.json into your Cosmos setup for lineage purposes, you can follow these steps:

  1. Ensure catalog.json is Generated: Make sure that you have run dbt docs generate to create the catalog.json file in your dbt project's target directory. This file contains the necessary column-level metadata.

  2. Access catalog.json in Cosmos: Cosmos provides built-in functionality to access dbt-generated documentation files like catalog.json. You can use the DbtDocsView class in the Cosmos plugin, which includes an endpoint to expose the catalog.json file. This allows you to access the file's content as a JSON response [1].

  3. Integrate with OpenLineage: Cosmos supports enhanced metadata extraction and lineage tracking through integration with OpenLineage. This integration can parse artifacts generated by dbt commands and create lineage events, which can include column-level metadata if properly configured [2].

  4. Configure OpenLineage: Ensure that your OpenLineage setup is correctly configured to capture and emit column-level metadata. This might involve setting up the OpenLineage provider in Airflow to recognize and process the metadata from catalog.json.

  5. Modify DbtDag Configuration: While the specific changes to the DbtDag configuration to pass column information aren't detailed in the available documentation, you should ensure that your DAG is set up to access the catalog.json file and that the OpenLineage integration is correctly configured to utilize this metadata.

By following these steps, you should be able to incorporate column-level metadata into your lineage tracking with Cosmos and OpenLineage.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added area:lineage Related to open lineage or DBT dataset lineage dbt:docs Primarily related to dbt docs command or functionality profile:snowflake Related to Snowflake ProfileConfig labels Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:lineage Related to open lineage or DBT dataset lineage dbt:docs Primarily related to dbt docs command or functionality profile:snowflake Related to Snowflake ProfileConfig
Projects
None yet
Development

No branches or pull requests

1 participant