Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Iceberg: integration test for glue #49467

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Dec 13, 2024

closes https://github.com/airbytehq/airbyte-internal-issues/issues/11140

key changes

  • Handle uppercase stream name/namespace (glue requires lowercase here). The TableIdGenerator thing is... kind of dumb, but it works I guess (we should eventually plug in the default namespace stuff there)
    • killed the DestinationStream.Descriptor.toIcebergTableIdentifier extension function
  • implement a destination cleaner. It just nukes any tables older than 30 days. (only needed for glue, since nessie is all ephemeral test container stuff)
  • update data dumper to use the correct catalog, instead of hardcoding nessie
  • enable the check test, targeting the glue config

Copy link

vercel bot commented Dec 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 17, 2024 8:04pm

@edgao
Copy link
Contributor Author

edgao commented Dec 17, 2024

asking dev-tooling why PR checks didn't start running https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1734451826862259

but the previous commit had basically green CI, it was just failing b/c of required version bump

@edgao edgao force-pushed the edgao/iceberg_glue_integration_test branch from ee4a2c8 to 6204548 Compare December 17, 2024 16:31
import org.apache.iceberg.catalog.Namespace
import org.apache.iceberg.catalog.SupportsNamespaces

class IcebergDestinationCleaner(private val catalog: Catalog) : DestinationCleaner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have IcebergTableCleaner which has the clearTable method, we should use that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you remember why that method has an explicit io.deletePrefix call? afaict, glueCatalog.dropTable(..., purge = true) is sufficient to delete the underlying files from S3

(maybe the nessie catalog behaves differently?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdpgrailsdev would know it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgao Nessie doesn't delete the data when the drop table API is called. This method was added to also cleanup the stored data files associated the table when we drop it via the API call. This may or may not be necessary for all catalogs, so we should test what happens in Glue if we just call the API method on the catalog to delete the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nessie why :(

(done in ea13d27)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that glue does the right thing if you set the purge flag to true, but... it's easy enough to call the table cleaner always

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/destination/iceberg-v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants