Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Glacier skipping logic to S3FileSystem #18237

Closed

Conversation

ericlgoodman
Copy link

@ericlgoodman ericlgoodman commented Jul 12, 2023

Description

TrinoS3FileSystem contains logic/configuration for skipping Glacier objects found in S3.

This PR adds the missing functionality/configuration from TrinoS3FileSystem and also adds the capability to configure the filesystem to read restored Glacier objects.

Additional context and related issues

Recently, S3 added the RestoreStatus to the result of ListObjectsV2 - see https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-s3-restore-status-s3-glacier-objects-list-api/ for more details.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
() Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot
Copy link

cla-bot bot commented Jul 12, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ericlgoodman ericlgoodman force-pushed the glacier-restored-object branch from 4383d43 to b20b5de Compare July 13, 2023 15:10
@cla-bot
Copy link

cla-bot bot commented Jul 13, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ericlgoodman ericlgoodman marked this pull request as ready for review July 13, 2023 19:13
Copy link
Member

@aczajkowski aczajkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % few suggestions

@@ -202,4 +211,34 @@ private static void validateS3Location(Location location)
{
new S3Location(location);
}

private boolean shouldReadObject(S3Object object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Maybe move connected methods to S3ObjectStorageClassFilter and return function from context

.filter(context.s3ObjectStorageClassFilter())

e.g

public enum S3ObjectStorageClassFilter
    {
        READ_ALL(o -> true),
        SKIP_ALL_GLACIER(S3ObjectStorageClassFilter::isNotGlacierObject),
        READ_RESTORED_GLACIER_OBJECTS(S3ObjectStorageClassFilter::isCompletedRestoredObject);
        
        private Function<S3Object, Boolean> filter;

       private S3ObjectStorageClassFilter(Function<S3Object, Boolean> filter){
          .....
       }

    private static boolean isCompletedRestoredObject(S3Object object)
    {
        /* There are 3 cases for the restore status:
         *
         * 1. The object is not restored, and has not been requested to be restored. We should have a null
         * value for restoreStatus
         * 2. The object is in the process of being restored. isRestoreInProgress will be true, but restoreExpiryDate
         * will be null.
         * 3. The object has completed restore. isRestoreInProgress will be false, restoreExpiryDate will be some date.
         *
         * Since we only care about distinguishing when it is case 3, we can just check restoreExpiryDate for a non-null value. */
        return isNotGlacierObject(object) || Optional.ofNullable(object.restoreStatus())
                .map(RestoreStatus::restoreExpiryDate)
                .isPresent();
    }

    private static boolean isNotGlacierObject(S3Object object)
    {
        return !GLACIER_STORAGE_CLASSES.contains(object.storageClass());
    }
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Will update

return switch (context.s3ObjectStorageClassFilter()) {
case READ_ALL -> true;
case SKIP_ALL_GLACIER -> !isGlacierObject(object);
case READ_RESTORED_GLACIER_OBJECTS -> !isGlacierObject(object) || isCompletedRestoredObject(object);
Copy link
Member

@aczajkowski aczajkowski Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will restore ALL non Glacier objects and RESTORED objects ? Should we adjust name ?
Maybe
READ_ALL
READ_NON_GLACIER
READ_NON_GLACIER_AND_RESTORED

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is more clear will update

@ericlgoodman
Copy link
Author

ericlgoodman commented Jul 20, 2023

Facing some dependency issues, as it looks like the version of Netty used in the SDK with restoreStatus breaks Apache Arrow:

<!-- Netty 4.1.94.Final breaks Apache Arrow -->
<dep.netty.version>4.1.93.Final</dep.netty.version>

Per this comment, we need to wait for the next release of Apache Arrow.

@aczajkowski aczajkowski requested review from findepi and losipiuk July 21, 2023 19:24
return s3ObjectStorageClassFilter;
}

@Config("s3.object-storage-class-filter")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly, this is applicable to hive connector only and should be a config there.
Also, i don't know why anyone would want to silently skip glacier objects. that's just returning wrong data, isn't it?

delta and iceberg use listings for cleanups (eg DROP TABLE); in such case I don't think we should be skipping glacier objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, although this is actually a problem with the legacy S3 code. Thoughts on how to improve this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly, this is applicable to hive connector only and should be a config there. delta and iceberg use listings for cleanups (eg DROP TABLE); in such case I don't think we should be skipping glacier objects.

Good point. What about overloading listFiles() and adding a function signature which accepts a filter (and defaults to the filesystem configuration for the regular signature of listFiles(Location location)? The S3AFileSystem does something similar.

Also, i don't know why anyone would want to silently skip glacier objects. that's just returning wrong data, isn't it?

This is the default behavior in Athena. Lots of people have lifecycle policies on S3 buckets which transition their objects in place from standard storage to Glacier storage. This allows them to continue to read from that table.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the use case for this feature? Silently skipping data is a correctness issue.

ListObjectsV2Iterable iterable = client.listObjectsV2Paginator(request);
return new S3FileIterator(s3Location, iterable.contents().iterator());
Iterator<S3Object> iterator = client.listObjectsV2Paginator(request).contents()
.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: put stream on previous line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do


private static boolean isCompletedRestoredObject(S3Object object)
{
/* There are 3 cases for the restore status:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simplifying this to

// Only restored objects will have the restoreExpiryDate set.
// Ignore not-restored objects and in-progress restores.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do


private static boolean isGlacierObject(S3Object object)
{
return GLACIER_STORAGE_CLASSES.contains(object.storageClass());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this to

return (object.storageClass() == GLACIER) || (object.storageClass() == DEEP_ARCHIVE);

Or we could do

return switch (object.storageClass()) {
    case GLACIER, DEEP_ARCHIVE -> true;
    default -> false;
};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

return s3ObjectStorageClassFilter;
}

@Config("s3.object-storage-class-filter")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, although this is actually a problem with the legacy S3 code. Thoughts on how to improve this?

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 12, 2024
@mosabua
Copy link
Member

mosabua commented Jan 12, 2024

👋 @ericlgoodman @findepi @electrum - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@github-actions github-actions bot removed the stale label Jan 15, 2024
Copy link

github-actions bot commented Feb 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 5, 2024
@electrum
Copy link
Member

Apologies for the delay on following up on this. As @findepi mentioned, putting the skipping logic directly in the file system is problematic for deletes, and we really only want this to apply to Hive file listing. Adding a new listing method just for this is ugly and breaks the abstractions.

Looking at this with fresh eyes, I'm thinking we can implement this by adding a Set<String> tags field to the FileEntry object. Objects could be tagged with s3:glacier and s3:glacierRestored as applicable. Then the Hive connector can have a hive.s3.skip-glacier-objects config in HiveConfig and handle the skipping inside TrinoFileStatusRemoteIterator.

@github-actions github-actions bot removed the stale label Feb 13, 2024
@ericlgoodman
Copy link
Author

Apologies for the delay on following up on this. As @findepi mentioned, putting the skipping logic directly in the file system is problematic for deletes, and we really only want this to apply to Hive file listing. Adding a new listing method just for this is ugly and breaks the abstractions.

Looking at this with fresh eyes, I'm thinking we can implement this by adding a Set<String> tags field to the FileEntry object. Objects could be tagged with s3:glacier and s3:glacierRestored as applicable. Then the Hive connector can have a hive.s3.skip-glacier-objects config in HiveConfig and handle the skipping inside TrinoFileStatusRemoteIterator.

Sounds good to me. Sorry for the delay on this - we've had a lot of reshuffling lately. Planning on having someone on my team pick this up in the next few weeks.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 20, 2024
@bangtim
Copy link
Contributor

bangtim commented Mar 21, 2024

I'm thinking we can implement this by adding a Set<String> tags field to the FileEntry object. Objects could be tagged with s3:glacier and s3:glacierRestored as applicable. Then the Hive connector can have a hive.s3.skip-glacier-objects config in HiveConfig and handle the skipping inside TrinoFileStatusRemoteIterator.

Hi @electrum, I'll be picking this back up. I was wondering how we should go about implementing the skipping logic in TrinoFileStatusRemoteIterator. If we were to pass the filter to there, how would we go about the hasNext() and next() logic? hasNext() would only allow us to see if there is a next object, whereas in next() we could see if the object matches our filter but we wouldn't be able to just call next() again b/c we aren't sure if it hasNext(). Any help would be appreciated, thank you

@github-actions github-actions bot removed the stale label Mar 21, 2024
@electrum
Copy link
Member

Hi @bangtim, take a look at io.trino.plugin.hive.fs.DirectoryListingFilter or com.google.common.collect.AbstractIterator for ideas on how to implement the skipping logic. The basic idea is that the iterator needs to do the filtering and hold onto the next element, which can then be referenced in the hasNext() or next() methods.

@ericlgoodman
Copy link
Author

To be completed via #21164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants