Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default data location for AWS Glue tables #8472

Closed
DerkSchooltink opened this issue Jun 10, 2020 · 3 comments · Fixed by #8999
Closed

Default data location for AWS Glue tables #8472

DerkSchooltink opened this issue Jun 10, 2020 · 3 comments · Fixed by #8999
Assignees
Labels
@aws-cdk/aws-glue Related to AWS Glue effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on.

Comments

@DerkSchooltink
Copy link
Contributor

The Question

I started rolling out a Glue table with the following CDK structs:

const trailTable = new Table(this, 'TrailTable', {
    bucket: <ref to bucket>,
    database: <ref to db>,
    tableName: 'table',
    columns: [{
        name: 'user',
        type: Schema.STRING,
    }],
    dataFormat: DataFormat.JSON,
});

To my surprise the Glue table pointed towards my bucket using this URL: s3://<bucket>/data.

Looking into the documentation of CDK, indeed /data is the default location for the Glue data to be discovered from (this is the s3Prefix property). But little explanation is given why this is the default. Is this done to follow certain guidelines or is this a randomly chosen path?

I would have expected the default to be just empty; no nested folder in the bucket but just the root. Defining the blank s3Prefix seems to be out of place to achieve this behavior:

const trailTable = new Table(this, 'TrailTable', {
    ...,
    s3Prefix: '',
});

Proposed solution

Remove the default that points to /data for the s3Prefix and use empty string instead

OR

Provide documentation that explains why /data is chosen as default

@DerkSchooltink DerkSchooltink added the needs-triage This issue or PR still needs to be triaged. label Jun 10, 2020
@SomayaB SomayaB added @aws-cdk/aws-glue Related to AWS Glue feature-request A feature should be added or improved. guidance Question that needs advice or information. labels Jun 11, 2020
@iliapolo
Copy link
Contributor

@sam-goodwin - As the original author of this, mind if I pick your brain?

@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Jun 29, 2020
@iliapolo
Copy link
Contributor

iliapolo commented Jul 6, 2020

Marking this as a feature request to use an empty string as the default data location, which seems like a more reasonable default.

@iliapolo iliapolo added effort/small Small work item – less than a day of effort and removed guidance Question that needs advice or information. labels Jul 6, 2020
@sam-goodwin
Copy link
Contributor

There was no intelligent reasoning. I agree it should be changed since it’s too opinionated.

DerkSchooltink pushed a commit to DerkSchooltink/aws-cdk that referenced this issue Jul 10, 2020
fixes aws#8472

BREAKING CHANGE: the default location of glue data will be the root of an s3 bucket, instead of /data
DerkSchooltink pushed a commit to DerkSchooltink/aws-cdk that referenced this issue Jul 10, 2020
fixes aws#8472

BREAKING CHANGE: the default location of glue data will be the root of an s3 bucket, instead of /data
@SomayaB SomayaB added the in-progress This issue is being actively worked on. label Jul 10, 2020
DerkSchooltink pushed a commit to DerkSchooltink/aws-cdk that referenced this issue Jul 13, 2020
fixes aws#8472

BREAKING CHANGE: the default location of glue data will be the root of an s3 bucket, instead of /data
DerkSchooltink pushed a commit to DerkSchooltink/aws-cdk that referenced this issue Jul 13, 2020
fixes aws#8472

BREAKING CHANGE: the default location of glue data will be the root of an s3 bucket, instead of /data
@mergify mergify bot closed this as completed in #8999 Jul 13, 2020
mergify bot pushed a commit that referenced this issue Jul 13, 2020
#8999)

Fixes #8472

BREAKING CHANGE: The default location of glue data will be the root of an s3 bucket, instead of `/data`

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
curtiseppel pushed a commit to curtiseppel/aws-cdk that referenced this issue Aug 11, 2020
aws#8999)

Fixes aws#8472

BREAKING CHANGE: The default location of glue data will be the root of an s3 bucket, instead of `/data`

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants