Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NotificationType.VALIDATE which can serve as a dry-run of a CREATE without a metadata file #321

Merged

Conversation

dennishuo
Copy link
Contributor

@dennishuo dennishuo commented Sep 26, 2024

Description

If a remote catalog or manual caller wants to ensure that permissions, paths, etc., are configured correctly to receive CREATE/UPDATE notifications before deciding to actually create a table in the remote catalog, sending a VALIDATE notification with the prospective table metadata path can pre-validate basic setup. In a VALIDATE call, no actual entities will be mutated or created.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

…E without a metadata file

If a remote catalog or manual caller wants to ensure that permissions, paths, etc., are configured
correctly to receive CREATE/UPDATE notifications before deciding to actually create a table in the
remote catalog, sending a VALIDATE notification with the prospective table metadata path can
pre-validate basic setup. In a VALIDATE call, no actual entities will be mutated or created.
}

// Validate location against the resolvedStorageEntity
String metadataLocation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, is the expectation is to supply a fake metadata.json location in the payload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described some of the options in the rest spec description, but in a nutshell it can be either a parent directory or a prospective full filename, and the behavior is simply for the validation to ensure that whatever path is specified conforms to the ALLOWED_LOCATIONS of the storage config.

// to findStorageInfoFromHierarchy.
PolarisResolvedPathWrapper resolvedStorageEntity = null;
Optional<PolarisEntity> storageInfoEntity = Optional.empty();
for (int i = tableIdentifier.namespace().length(); i >= 0; i--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still validate that this principle has privileges to create namespaces and table (if they don't exist)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is validated in the updated PolarisCatalogHandlerWrapperAuthzTest.java

}

@Test
public void testValidateNotificationFailToCreateFileIO() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test case to add would be if the role doesn't have privileges to create the namespace or table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authz tests are covered in PolarisCatalogHandlerWrapperAuthzTest.java

Copy link
Contributor

@tzuan16 tzuan16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Dennis!


// The location of the metadata JSON file specified in the create will be forbidden.
// For a VALIDATE call we can pass in the metadata/ prefix itself instead of a metadata JSON
// filename.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: update comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a general checkPrivileges API so that a user can validate any privilege they require. it doesn't cover everything that's needed here, but maybe it's possible that we could model a privilege check to contain the information needed to actually validate most of the requirements. e.g., a TableGrant currently contains the privilege and the namespace and table name. For default table locations, that's all you'd need to validate create/read/write privileges. For custom table locations, we'd need a different entity to validate the location.

@dennishuo
Copy link
Contributor Author

It would be nice to have a general checkPrivileges API so that a user can validate any privilege they require. it doesn't cover everything that's needed here, but maybe it's possible that we could model a privilege check to contain the information needed to actually validate most of the requirements. e.g., a TableGrant currently contains the privilege and the namespace and table name. For default table locations, that's all you'd need to validate create/read/write privileges. For custom table locations, we'd need a different entity to validate the location.

Yeah good point - I definitely also considered something along the lines of a "dry-run" parameter for this instead that could be generalized to other API requests. In this case when I started implementing it that way it became clear that there are enough deviations for the use case that it would be a bit misleading and messy to treat it as a dry run.

In particular, this use case seems to pop up more prominently due to the inherently "cooperative" nature of the intended federation use case of the Notifications API, where a remote catalog wants to pre-validate before creating any metadata files on their side. This also means the control flow isn't quite the same as e.g. injecting a no-op metastore for writes while exercising all reads the same as "real" calls.

I agree it'll be nice to have some generalized prevalidation longer-term though. checkPrivilege with the ability to expand the validation to things like location-validation and config-validation sounds like a good alternative vs dry-run to capture the fact that the validation set might need to be a different workflow than what a simple dry-run might imply.

Another thing that makes me want to lean towards a more targeted initial scope here is that the Notification API is more actively experimental while we better align in the broader community on the concept of external tables (https://lists.apache.org/thread/zcv6qm9ysknrhfpg093qgnrkrolptcht) so as long as the intended use case is something we see long-term (the use case of prevalidation/checkPrivilege/etc) then it makes sense to iterate on a narrower implementation that could change easily and then we can apply learnings from the use case towards the generalization of the implementation.

RussellSpitzer
RussellSpitzer previously approved these changes Sep 27, 2024
@collado-mike
Copy link
Contributor

Yeah good point - I definitely also considered something along the lines of a "dry-run" parameter for this instead that could be generalized to other API requests. In this case when I started implementing it that way it became clear that there are enough deviations for the use case that it would be a bit misleading and messy to treat it as a dry run.

A HEAD option would be cool - except that I don't think you're allowed to submit a request body. Maybe support a query parameter?

The prospect of VALIDATE as a notification type doesn't really make sense to me. Ultimately, I think a collection of specific checks - privileges, table location, catalog type - will all be super useful.

Though I suppose this notify API is apt to change anyway, so it probably doesn't hurt to add a new notification type now, even if it is a bit wonky

@RussellSpitzer
Copy link
Member

Thanks @dennishuo , thanks for reviews @collado-mike @tzuan16 and @sfc-gh-mparmar!

@RussellSpitzer RussellSpitzer merged commit 1c8ca73 into apache:main Sep 27, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants