-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to the Polaris CLI #30
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- Please describe your change here and remove this comment --> This PR covers various improvements to the [recently merged](https://github.com/snowflakedb/managed-pull/7) Polaris CLI. Changes here include: * Added the `namespaces` command for creating, listing, and dropping namespaces with the CLI * Refactored error handling to reduce the proliferation of string literals * Added support for the `FILE` storage type * Added end-to-end regression tests for the CLI * Various usability and bug fixes * Support for the `CLIENT_ID` and `CLIENT_SECRET` environment variables * CLI documentation ## Pre-review checklist - [ ] I attest that this change meets the bar for low risk without security requirements as defined in the [Accelerated Risk Assessment Criteria](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/accelerated-risk-assessment/#eligibility) and I have taken the [Risk Assessment Training in Workday](https://wd5.myworkday.com/snowflake/learning/course/6c613806284a1001f111fedf3e4e0000). - Checking this checkbox is mandatory if using the [Accelerated Risk Assessment](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/accelerated-risk-assessment/) to risk assess the changes in this Pull Request. - If this change does not meet the bar for low risk without security requirements (as confirmed by the peer reviewers of this pull request) then a [formal Risk Assessment](https://developer-handbook.m1.us-west-2.aws.app.snowflake.com/docs/reference/security-review/risk-assessment/) must be completed. Please note that a formal Risk Assessment will require you to spend extra time performing a security review for this change. Please account for this extra time earlier rather than later to avoid unnecessary delays in the release process. - [ ] This change has code coverage for the new code added
eric-maynard
changed the title
SNOW-1527866: [3/x] Improvements to the Polaris CLI (#21)
Improvements to the Polaris CLI
Jul 31, 2024
sfc-gh-aixu
reviewed
Jul 31, 2024
kevinjqliu
pushed a commit
to kevinjqliu/polaris-catalog
that referenced
this pull request
Jul 31, 2024
kevinjqliu
pushed a commit
to kevinjqliu/polaris-catalog
that referenced
this pull request
Jul 31, 2024
…pache#46) * Squashed commit of the following: Co-authored-by: Evgeny Zubatov <evgeny.zubatov@snowflake.com> Co-authored-by: Michael Collado <collado.mike@gmail.com> Co-authored-by: Shannon Chen <shannon.chen@snowflake.com> Co-authored-by: Eric Maynard <eric.maynard@snowflake.com> Co-authored-by: Alvin Chen <alvin.chen@snowflake.com> commit de0b4ee768a62221a480dce7da935a27a206d076 Merge: 1c19fc8 85e69a3 Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 16:36:25 2024 -0700 Merge commit '3e6e01aae203356ed972502dfa596d04ec5a8ca5' into mcollado-merge-oss commit 1c19fc877231e34d5e8baa4a05902d13f6120050 Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 16:25:05 2024 -0700 Merge polaris-dev OSS contributions commit a3fbf4ce4bc6c629bef308349b7c7a64c8335ac9 Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 15:43:23 2024 -0700 Fix token refresh in oauth service to work with client credentials (apache#37) The Iceberg REST client _does_ retry refreshing the auth token with client credentials, but it submits them in Basic Auth form rather than as form parameters. We need to base64/url-decode them in order to validate the credentials are correct. We also need to return an accepted tokenType during refresh. Tested with ```java RESTSessionCatalog sessionCatalog = new RESTSessionCatalog(config -> HTTPClient.builder(config).uri(config.get(CatalogProperties.URI)).build(), null); sessionCatalog.initialize("demo", Map.of( "uri", "http://localhost:8181/api/catalog", "prefix", "catalog", "credential", "$URLENCODED_CLIENTID:$URLENCODED_CLIENTSECRET", "scope", "PRINCIPAL_ROLE:ALL", "token-refresh-enabled", "true", "warehouse", "martins_demo1" )); Field catalogAuth = RESTSessionCatalog.class.getDeclaredField("catalogAuth"); catalogAuth.setAccessible(true); OAuth2Util.AuthSession authSession = (OAuth2Util.AuthSession) catalogAuth.get(sessionCatalog); Field client = RESTSessionCatalog.class.getDeclaredField("client");; client.setAccessible(true); RESTClient restClient = (RESTClient) client.get(sessionCatalog); for (int i = 0; i < 10; i++) { System.out.println(authSession.refresh(restClient)); Thread.sleep(10000); } ``` commit 517cb6231d424fac59ceecb1845bdb0a3e065265 Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 10:47:32 2024 -0700 Changed reg test docker image to stop exposing aws credentials as env variables (apache#36) In the reg tests, when S3 credentials aren't present in the FileIO, the S3 client is falling back to the credentials in the environment variables, which have access to everything. This caused a previous bug to go uncaught. I verified that if I don't update the FileIO for a table, these tests fail now. commit e418aefe8964c7c67b509f8eec43055f1c17a742 Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 08:40:58 2024 -0700 Fix error with catalog FileIO credentials and path construction (apache#35) Namespace directories were being constructed backwards. Unsure why the tests didn't catch this FileIO for table creation and update was also not updating credentials correctly due to several `fileIO` variables in scope and updating the wrong one. I've renamed the variables to be more clear what each fileIO is scoped to. Future change upcoming to improve reg tests to catch issues - fileIO is falling back to environment variables in the docker image commit 7f37bb252cec7df07db3994bf981efe76a52639c Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 02:05:14 2024 -0700 Fix validation to read table metadata file after fileio initialization with credentials (apache#34) TableMetadataParser was reading metadata file before FileIO was initialized with credentials. This was uncaught in the tests because the FileIO reads the test image's environment variables. commit 0866f3ad54295a6d7822b9645f59996986d23acd Author: Michael Collado <michael.collado@snowflake.com> Date: Sun Jul 28 22:10:22 2024 -0700 Fixed issue when creating table under namespace with custom location (apache#33) Tables were still being created with the default directory structure when their parent namespace had a custom location. This fixes the issue and adds a test proving the table is successfully created and that its location is under the custom namespace directory commit ee701ff99120b948b5ed3120461a9aaf0842f944 Author: Michael Collado <michael.collado@snowflake.com> Date: Sun Jul 28 20:53:52 2024 -0700 Disallow overlapping base locations for table and namespaces and prevent table location from escaping via metadata file (apache#32) Two major changes included here: * Disables table locations and namespace locations from overlapping. Since namespaces can't overlap and tables can't overlap sibling namespaces or sibling tables, this prevents all table locations from overlapping within a catalog * Prevents metadata files from pointing at table locations outside of the table's base location Both of these features are controllable by feature flags. Because of existing unit and integration tests (notably the ones we import from Iceberg), I also made the TableMetadata location check and the namespace directory checking configurable at the catalog level. However, the overlap checking within a namespace is not configurable at the catalog level (this means there's a gap so that if a catalog allows metadata files to point to locations outside of the namespace directory, a table's location could overlap with a table in another directory). It is possible for a table or a namespace to _set_ its base-directory to something other than the default. However, tables and namespaces with overridden base locations still cannot overlap with sibling tables or namespaces. commit 51ac320f60c103c2b10cd8d2910217010f38afdd Author: Shannon Chen <shannon.chen@snowflake.com> Date: Sun Jul 28 23:13:59 2024 +0000 The loadTable is current scoped to table location, this PR makes 1. loadTable only scoped to table location + `metadata/` and `data/`. 2. when refreshTable keep it only scoped to table location + `metadata` 3. Throw user error when the user specify `write.metadata.path` or `write.data.path` commit 838ba65849e7f4f0dd5dfcac0093262a165e52e4 Author: Eric Maynard <eric.maynard@snowflake.com> Date: Fri Jul 26 22:04:41 2024 -0700 CREATE_TABLE shouldn't return credentials to read the table if the user doesn't have that privilege (apache#29) Before CREATE_TABLE returns credentials that can be used to read a table, Polaris should validate that the user has TABLE_READ_DATA on that table. commit 96257f4fa54372fb565954024cbfe256de5d6f20 Author: Alvin Chen <alvin.chen@snowflake.com> Date: Fri Jul 26 15:43:35 2024 -0700 Call metric init on IcebergRestConfigurationApi and IcebergRestOAuth2Api class (apache#30) commit d6e057778811f20a462ad2746722e1a1427197cf Author: Alvin Chen <alvin.chen@snowflake.com> Date: Fri Jul 26 10:22:34 2024 -0700 Switch Metrics Emission from Ad-Hoc to Initialization-Based (apache#28) <!-- Please describe your change here and remove this comment --> Metrics are currently emitted in an ad-hoc fashion, meaning they will only be queriable in Prometheus if the corresponding API endpoint is invoked. This makes plotting difficult in Grafana, especially in the case where we need to aggregate across multiple metrics to reflect, for instance, the overall error rate across all endpoints in the application. Say we have metrics a, b, c. If b is null since the corresponding API has not yet been invoked, a+b+c would result in the value null, instead of the sum of a and c. This can be fixed in Grafana by "filling in" the metrics with a metric that is guaranteed to be present, say `up`. The promql query will then become: `(a or (up * 0)) + (b or (up * 0)) + (c or (up * 0))` However, the query becomes extremely large and slow. Thus to avoid this, we'll make sure the metrics are always emitted regardless of the corresponding API being called. We also add a counter metric to each endpoint to track the total number of invokes. Previously we had timers who have an attribute `count`. However, they are only incremented on successes (since we only record timer on success), therefore, they did not incorporate failures in the counts. commit 20bf59b9b5e8dba9a35b932f036114b85375829b Author: Eric Maynard <eric.maynard@snowflake.com> Date: Fri Jul 26 10:09:50 2024 -0700 When a catalog is created or updated, we should check that the catalog does not have a location which overlaps with the location of an existing catalog. * Squashed commit of the following: commit b65dbc68c43e7c3ff0e1901e516c9749fda58ced Author: Michael Collado <michael.collado@snowflake.com> Date: Mon Jul 29 17:24:10 2024 -0700 Fix Gradle formatting error in merge * Update aws config to check for blank instead of null to address reg tests when aws keys are not available
sfc-gh-aixu
approved these changes
Jul 31, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
sfc-gh-aixu
approved these changes
Jul 31, 2024
dennishuo
approved these changes
Jul 31, 2024
This was referenced Aug 1, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR covers various improvements to the Polaris CLI. Changes here include:
namespaces
command for creating, listing, and dropping namespaces with the CLIFILE
storage typeCLIENT_ID
andCLIENT_SECRET
environment variablesType of change
Please delete options that are not relevant.
How Has This Been Tested?
Added new test
test_cli.py
and updated existing CLI testChecklist:
Please delete options that are not relevant.