Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make S3 SelectObjectContentInput ScanRange members pointers #2147

Closed
wants to merge 3 commits into from

Conversation

cad
Copy link

@cad cad commented Jun 7, 2023

Problem

TLDR: Int64 members Start and End of ScanRange for the SelectObjectContentInput are omitted when serialized if they are 0.

When calling SelectObjectContent, you can specify a ScanRange to only request
records between start and end positions.

For example we want to process only the records starting from the 0th byte to the 100th byte we can have a Go code like this:

cfg, err := config.LoadDefaultConfig(context.TODO(),
    config.WithClientLogMode(aws.LogRequestWithBody|aws.LogResponse), // Enable debug logging API calls
)
if err != nil {
    panic(err)
}

// Create S3 client
client := s3.NewFromConfig(cfg)


// Send a SelectObjectContent request
if _, err := s3Client.SelectObjectContent(ctx, &s3.SelectObjectContentInput{
    Bucket: aws.String("bucket"),
    Key:    aws.String("key"),
    Expression: aws.String("SELECT * FROM S3Object"),
    ExpressionType: s3.ExpressionTypeSql,
    InputSerialization: &s3.InputSerialization{
        JSON: &s3.JSONInput{
            Type: s3.JSONTypeDocument,
        },
    },
    OutputSerialization: &s3.OutputSerialization{
        JSON: &s3.JSONOutput{},
    },
    ScanRange: &s3.ScanRange{ // Expected to get records starting from the 0th byte to the 100th byte
        Start: 0, 
        End:   100,
    },
}); err != nil {
    panic(err)
}

Expected API call should have a ScanRange section that looks like this:

...
<ScanRange>
    <End>100</End>
    <Start>0</Start>
</ScanRange>
...

But when checked the log output, it looks like this:

...
<ScanRange>
    <End>100</End>
</ScanRange>
...

Which meant, "process only the records within the last 100 bytes of the file"
according to the
docs.

So there is no way to specify the start position as 0 and end position as an arbitrary number, of the ScanRange at the current state of the SDK.

Hypothesis

The client just omits the Start or End member of the ScanRange shape when it is
equal to 0. Since it is an unboxed type and its empty value is 0 it is not serialized to the request.

Solution

Added a customization that adds clientOptional trait to the Start and End
members of the S3 ScanRange shape. And regenerate the client code to make
these members optional.

Since this will break any existing application that specifies ScanRange with Start and End fields set, an announcement is added to the changelog regarding the breaking change.

@cad cad requested a review from a team as a code owner June 7, 2023 15:28
@cad cad force-pushed the make-s3-scanrange-members-client-optional branch 2 times, most recently from e089216 to 4ae9266 Compare June 12, 2023 08:33
@cad cad force-pushed the make-s3-scanrange-members-client-optional branch from 4ae9266 to dbcf0cc Compare June 15, 2023 08:41
@lucix-aws
Copy link
Contributor

This is actually part of a larger-scale issue with default value serialization. See #2162.

We can't take this PR in isolation due to the breaking change, but we're investigating the issue at large to try to determine a more comprehensive way forward.

@lucix-aws lucix-aws closed this Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants