Skip to content

GlueCatalog name validation #12185

@devinrsmith

Description

@devinrsmith

Apache Iceberg version

1.7.1 (latest release)

Query engine

None

Please describe the bug 🐞

I was confused for a while why a GlueCatalog I had constructed was successfully returning from GlueCatalog#listNamespaces(), but when I tried to use that namespaces to Catalog#listTables, it failed with:

org.apache.iceberg.exceptions.ValidationException: Cannot convert namespace my-glue-database to Glue database name, because it must be 1-252 chars of lowercase letters, numbers, underscore                                                
        at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)                                                                                                                                            
        at org.apache.iceberg.aws.glue.IcebergToGlueConverter.validateNamespace(IcebergToGlueConverter.java:109)                                                                                                                           
        at org.apache.iceberg.aws.glue.IcebergToGlueConverter.toDatabaseName(IcebergToGlueConverter.java:125)                                                                                                                              
        at org.apache.iceberg.aws.glue.GlueCatalog.loadNamespaceMetadata(GlueCatalog.java:503)                                                                                                                                             
        at org.apache.iceberg.catalog.SupportsNamespaces.namespaceExists(SupportsNamespaces.java:159)                                                                                                                                      
        at org.apache.iceberg.aws.glue.GlueCatalog.listTables(GlueCatalog.java:300)                                                                                                                                                        

This is not a technical limitation of Glue; if you look at the underlying API documentation it says

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Tracing the history of this, it looks like this may have been a limitation that Athena placed on Glue database names; but since the link in the documentation does not work anymore, I can't verify if that's actually still true.

  /**
   * A Glue database name cannot be longer than 252 characters. The only acceptable characters are
   * lowercase letters, numbers, and the underscore character. More details:
   * https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html
   *
   * @param namespace namespace
   * @return if namespace can be accepted by Glue
   */
  static boolean isValidNamespace(Namespace namespace) {

Regardless, it may be better to be lenient and allow names to the actual Glue spec?

A suspect that there may be similar issues with Table name validation, which the Glue API is also more lenient on.

I eventually found the workaround, #5041, but it seems sub-optimal to force read-only users to need to apply non-defaults to get this working. Maybe there is a middle ground, where Iceberg only applies stricter validation when the user is creating new resources?

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions