Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catalog-level override for ALLOW_TABLE_LOCATION_OVERLAP #264

Merged
merged 10 commits into from
Sep 10, 2024

Conversation

eric-maynard
Copy link
Contributor

Description

ALLOW_TABLE_LOCATION_OVERLAP can be used to allow two tables to share a location. Combined with ALLOW_UNSTRUCTURED_TABLE_LOCATION, this can be used to allow essentially arbitrary table locations. This also unlocks support of write.object-storage.enabled.

These configs should have a catalog-level override so certain catalogs can be created with lax location requirements while other catalogs keep stricter requirements.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Added a new suite PolarisOverlappingTableTest

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

@@ -1144,7 +1148,7 @@ private void validateNoLocationOverlap(
URI existing = URI.create(siblingLocation);
if (isUnderParentLocation(target, existing)
|| isUnderParentLocation(existing, target)) {
throw new org.apache.iceberg.exceptions.BadRequestException(
throw new org.apache.iceberg.exceptions.ForbiddenException(
"Unable to create table at location '%s' because it conflicts with existing table or namespace at location '%s'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't in scope here, and we probably talked about it already but doesn't this check essentially let any user with table create permissions find out the table location of every other table in the namespace regardless of permissions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I agree this is potentially an issue; maybe we should open up a discussion on it?

My $0.02 is that this is essentially the lesser of two evils; not doing this check would allow tables to share a location, undermining the role of credential vending and the value of scoping credentials to a prefix.

Perhaps in the future we can skip this check if credentials can somehow be scoped to just the files in a table e.g. by tagging or with file-level encryption. For now, we scope them to a prefix.

Copy link
Contributor

@dennishuo dennishuo Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this highlights one of the reasons why the instantiation of the PolarisResolutionManifest in this method is the odd one out and may feel out of place; normally the PolarisResolutionManifest is only instantiated in the auth-enforcement-aware layer (PolarisCatalogHandlerWrapper) to tie together an authz check with an explicit enumeration of the entities the request should be able to touch. So any time we instantiate one to perform lookups outside of that, we're intentionally forced to consider its consequences on the authz model.

I agree in this scenario the information-disclosure vector is probably the lesser of the concerns, and in both cases the catalog admin needs to make a judgement call on trust level of anyone allowed to create or update tables.

Perhaps segregating out privileges for different table-update types could help as well -- some workflows may want to restrict some principals/roles to only being able to update snapshots of a table for example without being allowed to perform "DDL" on the table to change schema or file location.

Also, it's worth noting that the analogous "information leak" from createTable throwing "Table already exists" is actually addressed in the privilege model by virtue of TABLE_CREATE being a super-privilege of TABLE_LIST -- because creating a "conflicting" table name allows one to "fish" for whether an existing table of the same name already exists, and the TABLE_EXISTS operation is conditioned on the TABLE_LIST privilege with both having the parent namespace as the authorization "target".

The analogous implicit privilege, I suppose, would be to say that both TABLE_CREATE and TABLE_WRITE_PROPERTIES effectively convey (a subset of) TABLE_READ_PROPERTIES at the parent-namespace level of the table in question, if SetLocation is being used in the update.

Catalog.TypeEnum.INTERNAL,
c.catalog,
propertiesBuilder.build(),
System.currentTimeMillis(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hardcode timestamp for deterministic testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of UUIDs here too ... would be nice to lock down random to a specific seed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the UUIDs... to be honest, we need a better way of making sure tests & suites don't step on each others' toes. We already use UUIDs in a few places so I feel okay keeping them around for now, but something deterministic / idiomatic would be better.

If you want, I can replace this with a (seeded) random number, just let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wait till we centralize some of this test code

RussellSpitzer
RussellSpitzer previously approved these changes Sep 4, 2024
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, I have a few style nits but everything makes sense to me.

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, optional comment suggestion

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

RussellSpitzer
RussellSpitzer previously approved these changes Sep 9, 2024
@RussellSpitzer RussellSpitzer enabled auto-merge (squash) September 9, 2024 17:07
@RussellSpitzer
Copy link
Member

Thanks @eric-maynard and @dennishuo for reviewing!

auto-merge was automatically disabled September 9, 2024 17:14

Head branch was pushed to by a user without write access

auto-merge was automatically disabled September 9, 2024 19:11

Head branch was pushed to by a user without write access

@RussellSpitzer RussellSpitzer enabled auto-merge (squash) September 10, 2024 16:17
@RussellSpitzer RussellSpitzer merged commit 50da177 into apache:main Sep 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants