Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that S3 export with buckets that contain dots #190

Closed
morazow opened this issue Feb 16, 2022 · 5 comments · Fixed by #199
Closed

Check that S3 export with buckets that contain dots #190

morazow opened this issue Feb 16, 2022 · 5 comments · Fixed by #199
Labels
bug Unwanted / harmful behavior

Comments

@morazow
Copy link
Contributor

morazow commented Feb 16, 2022

Situation

EXPORT test.t1
INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
  BUCKET_PATH     = 's3a://exa.test.aws.s3.bucket.etl.01/'
  DATA_FORMAT     = 'PARQUET'
  S3_ENDPOINT     = 's3.eu-west-1.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION'
  PARALLELISM     = 'iproc()'
  OVERWRITE = 'TRUE'
;

Exception:

EXA: EXPORT test.t1...
Error: [22002] VM error: F-UDF-CL-LIB-1126: F-UDF-CL-SL-JAVA-1006: F-UDF-CL-SL-JAVA-1026: 
com.exasol.ExaUDFException: F-UDF-CL-SL-JAVA-1068: Exception during singleCall generateSqlForExportSpec 
java.lang.IllegalArgumentException: bucket
org.apache.hadoop.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:144)
org.apache.hadoop.fs.s3a.S3AUtils.propagateBucketOptions(S3AUtils.java:1152)
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:374)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
com.exasol.cloudetl.bucket.Bucket.fileSystem$lzycompute(Bucket.scala:70)
com.exasol.cloudetl.bucket.Bucket.fileSystem(Bucket.scala:69)
com.exasol.cloudetl.scriptclasses.TableExportQueryGenerator$.deleteBucketPathIfRequired(TableExportQueryGenerator.scala:50)
com.exasol.cloudetl.scriptclasses.TableExportQueryGenerator$.generateSqlForExportSpec(TableExportQueryGenerator.scala:28)
com.exasol.cloudetl.scriptclasses.DockerTableExportQueryGenerator$.generateSqlForExportSpec(DockerTableExportQueryGenerator.scala:17)
com.exasol.cloudetl.scriptclasses.DockerTableExportQueryGenerator.generateSqlForExportSpec(DockerTableExportQueryGenerator.scala)
com.exasol.ExaWrapper.runSingleCall(ExaWrapper.java:100)
@morazow morazow added the bug Unwanted / harmful behavior label Feb 16, 2022
@redcatbear
Copy link
Contributor

@morazow, I remember that @jakobbraun recently solved the dot-in-filenames issue in the VS thanks to the new Connection definitions with split bucket path components. Is the same fix applicable here?

@morazow
Copy link
Contributor Author

morazow commented Feb 21, 2022

It is also applied here #120. Above issue was reported when exporting, and only strange thing was bucket name. At this moment, I am not sure what causes this issue. But dots in the name may be reason.

The exception line S3AUtils.java#L1152, just checks that bucket name is not an empty string.

So splitting and reassembling might still help, I am going to check it.

@morazow
Copy link
Contributor Author

morazow commented Mar 30, 2022

Hey all,
I have looked into this issue. The main reason for failure, is that java.net.URI getHost method does not work for bucket names that end in numbers.

    @ParameterizedTest
    @CsvSource({ //
            "s3a://exa.test.aws.s3.bucket.01.etl/", //
            "s3a://bucket.name.dots.007.s3.amazonaws.com/", //
            "s3a://007.s3.amazonaws.com/", //
            "s3a://007/", //
            "s3a://007L/", //
    })
    void testS3BucketURIValid(final String bucketPath) throws URISyntaxException {
        final URI uri = new URI(bucketPath);
        assertThat(uri.getHost(), is(notNullValue()));
    }

    @ParameterizedTest
    @CsvSource({ //
            "s3a://exa.test.aws.s3.bucket.etl.01/", //
            "s3a://exa.test.aws.s3.bucket.etl.01/key", //
            "s3a://bucket.name.dots.007/", //
            "s3a://pre.007/", //
    })
    void testS3BucketURIInvalid(final String bucketPath) throws URISyntaxException {
        final URI uri = new URI(bucketPath);
        assertThat(uri.getHost(), equalTo(null));
    }

AWS SDK should also fail for all schemes other than s3. For s3 scheme, it uses URI authority.

From S3 bucket naming rules:

  • Bucket names must begin and end with a letter or number.

So maybe it is not allowed to end a bucket name with a number.

I am going to add early check for this in the project with user friendly exception.

@jakobbraun
Copy link
Contributor

you could also take the chance and switch to the unified API...

@morazow
Copy link
Contributor Author

morazow commented Mar 30, 2022

That would be really good. But I am not aware any library for JVM that can unify them.

Only one is hadoop-tools, but here even the GCS is separately provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unwanted / harmful behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants