Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Glue ingestion with external table features #18511

Merged

Conversation

trina242
Copy link
Contributor

@trina242 trina242 commented Nov 4, 2024

Describe your changes:

Added file format, location path and external table lineage to GlueSource.

AWS Glue connector is quite poor in comparison to what you can find e.g. in AWS console. Some of the interesting features, like lineage, we can find in Athena connector - however, Glue tables can be queried by other engines, such as Trino. Athena is not a popular solution for companies holding huge amounts of data, due to costs. Fetching storage metadata in Trino is difficult, so adding them to Glue instead is a quick win.

Changes summary:

  • GlueSource now inherits from ExternalTableLineageMixin.
  • Table location is extracted directly from StorageDescriptor (if present).
  • File format is extracted from StorageDescriptor (if present) by parsing SerDe library class.
  • test_table_names is fixed - with no patching, get_tables_name_and_type was throwing warnings and not returning anything, hence the test was iterating over an empty result and not asserting anything.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.

  • My PR title is Fixes <issue-number>: <short explanation>

  • I have commented on my code, particularly in hard-to-understand areas.

  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

  • I have added tests around the new logic.

  • For connector/ingestion changes: I updated the documentation.
    -->

Copy link
Contributor

github-actions bot commented Nov 4, 2024

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link

sonarqubecloud bot commented Nov 5, 2024

@chirag-madlani chirag-madlani merged commit 47c75fe into open-metadata:main Nov 5, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants