Enhanced Glue ingestion with external table features #18511
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes:
Added file format, location path and external table lineage to
GlueSource
.AWS Glue connector is quite poor in comparison to what you can find e.g. in AWS console. Some of the interesting features, like lineage, we can find in Athena connector - however, Glue tables can be queried by other engines, such as Trino. Athena is not a popular solution for companies holding huge amounts of data, due to costs. Fetching storage metadata in Trino is difficult, so adding them to Glue instead is a quick win.
Changes summary:
GlueSource
now inherits fromExternalTableLineageMixin
.test_table_names
is fixed - with no patching,get_tables_name_and_type
was throwing warnings and not returning anything, hence the test was iterating over an empty result and not asserting anything.Type of change:
Checklist:
I have read the CONTRIBUTING document.
My PR title is
Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
I have added tests around the new logic.
For connector/ingestion changes: I updated the documentation.
-->