Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astronomer-cosmos to send openlineage events to s3 bucket or cloudwatch logs #1423

Closed
bhawesh490 opened this issue Dec 23, 2024 · 4 comments
Closed
Labels
area:lineage Related to open lineage or DBT dataset lineage

Comments

@bhawesh490
Copy link

Hi
Team i am using astronomer-cosmos to create dags for my client projects.
This is running in aws managed airflow environment.
We want to send the open lineage events to the s3 bucket and later on we can consume those events from s3 and push it to aws datazone .
Can you provide some sample snippets to accomplish this?

Copy link

dosubot bot commented Dec 23, 2024

Currently, there is no direct functionality or documented example in the Cosmos codebase for sending OpenLineage events directly to AWS S3 or CloudWatch. However, you can achieve this by customizing the OpenLineage event handler to redirect events to your desired storage.

Here's a general approach you can take:

  1. Custom Event Handler: Implement a custom OpenLineage event handler in your Airflow setup. This handler can capture the OpenLineage events and then use AWS SDKs (like boto3 for Python) to send these events to an S3 bucket.

  2. Modify Airflow Configuration: Ensure your Airflow environment is configured to use this custom handler. You might need to adjust the Airflow configuration files or environment variables to point to your custom implementation.

  3. AWS SDK Integration: Use the AWS SDK to handle the authentication and data transfer to S3. Ensure that your Airflow environment has the necessary permissions to write to the S3 bucket.

  4. Testing and Validation: Test the setup in a development environment to ensure that events are correctly captured and sent to S3.

This approach requires some custom development, as there isn't a built-in feature in Cosmos for this specific use case.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the area:lineage Related to open lineage or DBT dataset lineage label Dec 23, 2024
@pankajastro
Copy link
Contributor

pankajastro commented Dec 23, 2024

Cosmos support openlineage See docs: https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html

To send event at remote location you have to setup transport configuration accordingly See docs: https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html

I tried to find docs to setup transport for s3 and found something here https://openlineage.io/docs/client/java/configuration/#s3 but this is for java client.

But I feel there should be way to upload lineage to cloud.

@tatiana / @pankajkoti do you know if uploading lineage to cloud work natively in Cosmos or not?

tatiana added a commit that referenced this issue Dec 27, 2024
Support users who may have questions similar to:
* #1423
* #1430
@tatiana
Copy link
Collaborator

tatiana commented Dec 27, 2024

Thanks for following up on this, @pankajastro ! Yes, all you wrote seems accurate. @mobuchowski could you confirm if there are any limitations with the S3 transport option in Python/Airflow?

I improve our OpenLineage docs to highlight limitations regarding which dbt adaptors are supported: #1431

@bhawesh490 since you created #1430, is it correct to assume we can close this ticket?

@bhawesh490
Copy link
Author

bhawesh490 commented Dec 27, 2024 via email

@tatiana tatiana closed this as completed Dec 27, 2024
tatiana added a commit that referenced this issue Dec 27, 2024
Support users who may have questions similar to:
* Related to: #1423
* Related to: #1430
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:lineage Related to open lineage or DBT dataset lineage
Projects
None yet
Development

No branches or pull requests

3 participants