-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
1.7.1 (latest release)
Query engine
None
Please describe the bug 🐞
I was confused for a while why a GlueCatalog I had constructed was successfully returning from GlueCatalog#listNamespaces(), but when I tried to use that namespaces to Catalog#listTables, it failed with:
org.apache.iceberg.exceptions.ValidationException: Cannot convert namespace my-glue-database to Glue database name, because it must be 1-252 chars of lowercase letters, numbers, underscore
at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at org.apache.iceberg.aws.glue.IcebergToGlueConverter.validateNamespace(IcebergToGlueConverter.java:109)
at org.apache.iceberg.aws.glue.IcebergToGlueConverter.toDatabaseName(IcebergToGlueConverter.java:125)
at org.apache.iceberg.aws.glue.GlueCatalog.loadNamespaceMetadata(GlueCatalog.java:503)
at org.apache.iceberg.catalog.SupportsNamespaces.namespaceExists(SupportsNamespaces.java:159)
at org.apache.iceberg.aws.glue.GlueCatalog.listTables(GlueCatalog.java:300)
This is not a technical limitation of Glue; if you look at the underlying API documentation it says
Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.
Tracing the history of this, it looks like this may have been a limitation that Athena placed on Glue database names; but since the link in the documentation does not work anymore, I can't verify if that's actually still true.
/**
* A Glue database name cannot be longer than 252 characters. The only acceptable characters are
* lowercase letters, numbers, and the underscore character. More details:
* https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html
*
* @param namespace namespace
* @return if namespace can be accepted by Glue
*/
static boolean isValidNamespace(Namespace namespace) {Regardless, it may be better to be lenient and allow names to the actual Glue spec?
A suspect that there may be similar issues with Table name validation, which the Glue API is also more lenient on.
I eventually found the workaround, #5041, but it seems sub-optimal to force read-only users to need to apply non-defaults to get this working. Maybe there is a middle ground, where Iceberg only applies stricter validation when the user is creating new resources?
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time