-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination Iceberg: integration test for glue #49467
base: master
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
5d23bfb
to
baa3404
Compare
7db4b88
to
b30724b
Compare
e1b9e0d
to
86ddc19
Compare
asking dev-tooling why PR checks didn't start running https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1734451826862259 but the previous commit had basically green CI, it was just failing b/c of required version bump |
ee4a2c8
to
6204548
Compare
import org.apache.iceberg.catalog.Namespace | ||
import org.apache.iceberg.catalog.SupportsNamespaces | ||
|
||
class IcebergDestinationCleaner(private val catalog: Catalog) : DestinationCleaner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have IcebergTableCleaner which has the clearTable
method, we should use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you remember why that method has an explicit io.deletePrefix
call? afaict, glueCatalog.dropTable(..., purge = true)
is sufficient to delete the underlying files from S3
(maybe the nessie catalog behaves differently?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdpgrailsdev would know it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgao Nessie doesn't delete the data when the drop table API is called. This method was added to also cleanup the stored data files associated the table when we drop it via the API call. This may or may not be necessary for all catalogs, so we should test what happens in Glue if we just call the API method on the catalog to delete the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nessie why :(
(done in ea13d27)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that glue does the right thing if you set the purge
flag to true, but... it's easy enough to call the table cleaner always
closes https://github.com/airbytehq/airbyte-internal-issues/issues/11140
key changes
DestinationStream.Descriptor.toIcebergTableIdentifier
extension functioncheck
test, targeting the glue config